Summary
Change can be thought of in two contexts: (1) modifying model logic (CI/CD), or (2) re-materializing models after source data refreshes. Having a way to safely deploy models to production is important. Two important aspects of model deployments are consistency and transparency.
Transparency for data consumers means knowing what data is available at a given time, and whether that data meets consistency and freshness SLAs. A data transformation job is invoked, and A succeeds but B fails. If a model succeeds, it should never need to be re-built until more data arrives or more model definition changes are made.
“Blue-green” deployments are a software deployment approach which can be used in many different contexts, from provisioning of cloud infrastructure to deploying new versions of an application. In the context of data transformation deployments, we can consider the current version of our production analytics database to be “blue” When we need to re-deploy models, we create a clone of that database where transformations can run. If the transformation job succeeds and all tests pass, then we can atomically swap the blue and green databases.
Before rolling back any models, we must first identify the affected subgraph for a given node invocation failure. For dbt users, here is a practical way to identify the Affected Subgraph. Once we have a list of affected models, there are a variety of methods to achieve the rollbacks.