Declarative data platforms

03.11.2025 | 8 min Read
Category: Data Platform | Tags: #Data Platform, #dbt, #Declarative, #DataOps

Declarative data platforms are not about less code, but about better abstractions. Instead of telling the system how to brew the coffee, you simply say "I want a cup of coffee".

From personal code to shared direction

In many organisations, the data platform has become a mosaic of scripts, notebooks and manual routines. Each developer has their own way of fetching, cleansing and modelling data. It works – until it does not.

When the pace increases and more teams need to collaborate, this individual freedom becomes a bottleneck: things are done differently, documentation is missing, and small errors spread quickly.

This is why we are now seeing a movement towards declarative data platforms – systems built on shared rules and clear descriptions, rather than personal practice. In short: we want to move away from ad hoc notebooks in PySpark and towards a platform that understands the intention – not just the code.

A cup of coffee and two ways of thinking

Imagine you want a cup of coffee.

In the imperative world, you specify exactly what should happen: “Boil 2.5 decilitres of water, grind 18 grams of beans, slowly pour the water over them for 25 seconds, wait for 3 minutes.”
In the declarative world, you simply say: “Get me a cup of coffee.”

The first method gives you full control but requires that you know every step. The second leaves the details to the system – the barista, the machine or the process – and focuses on the result.

Declarative data platforms are about the same thing: describing the desired state, not every step that leads there.

From manual control to intent-based management

Data platforms have long been built on imperative principles: the developer describes in detail how data should be moved, transformed and loaded. This provides control, but also complexity. Pipelines become fragile, changes take time, and technical debt grows rapidly.

A declarative data platform turns this on its head. Instead of describing how the processes should run, you describe what should be true about the data – and let the platform figure out how to achieve it.

Many of the greatest advances in programming and IT infrastructure over the past 15 years are built on declarative principles. We see this in languages such as SQL and React, in platforms such as Kubernetes and Terraform, and in tools that make data flows more self-documenting and self-managing.

This marks a shift reminiscent of the transition from manual infrastructure to Infrastructure as Code. Ford, Parsons and Kua already described in Building Evolutionary Architectures (2017) ¹ how complex systems must be defined through fitness functions – declarative descriptions of the desired state. In 2023, Papakonstantinou and colleagues at the University of California, San Diego, demonstrated how the same principle can be applied in data engineering. In Making Data Engineering Declarative (CIDR 2023)² they describe how pipelines can be defined based on desired properties and invariants – rather than procedures.

What does it mean to define pipelines based on “desired invariants”?

When Papakonstantinou and colleagues² talk about desired properties and invariants, they mean that we no longer describe the steps to be executed, but the properties that should always be true about the data.

An invariant is a rule that must always hold – regardless of how the data is updated. For example:

all orders must have a valid customer-id.
customer_sales must always represent the most recent aggregated order total per customer.

Instead of programming the transformations that achieve this, you describe what should be true. The platform determines for itself how and when it needs to be updated. This shifts the focus from procedures to principles – from how to what. The result is more robust, testable and self-managing data platforms.

What does “declarative” mean in practice?

Working declaratively is about describing the intention, not the recipe.

In a traditional pipeline, you say:

“Fetch data from A, transform it in this way, and load the result into table B.”

In a declarative model, you say:

“Table B should represent customer profitability based on the most recent order and cost data.”

In practice, it is more accurate to say that tools like dbt move in a declarative direction, but are not fully declarative. dbt describes the desired model structure, while the execution itself is still sequential. Nevertheless, dbt does much of what we want: it ensures the correct order, idempotency, documentation and testing.

And this is important: we do not need to choose. Often we start declaratively – to create structure, transparency and reuse – and adjust details imperatively where we must.

From code to descriptions – and a bit of YAML

Many modern tools, such as dbt, Dagster and Lakeflow Declarative Pipelines (formerly DLT), use declarative DSLs (domain-specific languages). Here, models and pipelines are typically defined in YAML or configuration files. This provides readability, versioning and a shared way of expressing business logic.

Most solutions are hybrid: a declarative core that handles structure and dependencies, combined with the ability to use code where the need is specific. This combination – declarative where you can, imperative where you must – provides flexibility without sacrificing oversight.

Practical examples

1. Data transformation with dbt

dbt demonstrates how declarative principles can be applied in practice. Instead of coding the entire processing pipeline, you define the desired model:

SELECT
    c.id AS customer_id,
    SUM(o.amount) AS total_sales
FROM {{ ref('orders') }} o
JOIN {{ ref('customers') }} c ON o.customer_id = c.id
GROUP BY c.id

ref() lets dbt build a dependency graph. The system itself knows the order, can run models in parallel, and ensures that changes are propagated correctly. At the same time, dbt provides idempotent runs, predefined tests, documentation and traceability – all of which are important steps towards more declarative platforms.

2. Infrastructure as Code with Terraform

Terraform describes the desired state as code – and ensures that the environment actually matches.

resource "google_bigquery_dataset" "sales" {
  dataset_id = "sales"
  location   = "EU"
}

Although Terraform is a desired state system, it does not have a continuous control loop. Changes require an active “apply” step, unlike Kubernetes, which continuously ensures alignment between actual and desired state. Both are declarative, however – the difference lies in the degree of autonomy.

3. Declarative orchestration with Dagster and Hamilton

Traditional orchestration tools such as Airflow³ are process-oriented – you describe when and how things should run. Newer tools such as Dagster and Hamilton are data-oriented: they describe data relationships, not workflows.

@asset
def sales(raw_sales, customers):
    return join_sales_with_customers(raw_sales, customers)

Dagster automatically understands that sales depends on raw_sales and customers. This makes the pipeline more self-governing and robust to changes.

4. Data quality as declarative rules

Declarative principles are particularly well suited for data quality. Instead of scripts, quality requirements are defined as rules that should always be true.

checks for orders:
  - row_count > 0
  - missing_count(order_id) = 0
  - duplicate_count(order_id) = 0

This functions as invariants for quality: if the rule is broken, the system detects it – and stops the pipeline before errors spread.

Abstractions – the core of declarative systems

Declarative systems are really about abstractions – how we describe and relate to complexity.

A good abstraction lets us “pretend” things are simpler than they actually are:

A file system lets us pretend a hard drive is folders and files.
SQL lets us pretend we are querying a logical dataset, not a physical storage system.

Abstractions are therefore not just a technique, but an interface between people and systems. The most rewarding work in data engineering is designing good abstractions: structures that hide complexity but preserve meaning. Declarative systems are precisely that – a layer of intention over a sea of details.

Model-driven data engineering

Behind the practical tools lies a deeper idea: model-driven data engineering (MDDE). Here, data relationships and transformations are described as semantic models, and the platform automatically generates code, documentation and dependencies.

The benefits are clear:

Consistency: one semantic source of truth.
Automation: pipelines and documentation are generated.
Transparency: the logic can be understood as relationships, not scripts.
Adaptability: changes in the model propagate automatically.

An example:

customer_sales = aggregate(orders, by=customer_id, sum=amount)

The platform handles SQL, dependencies and updates.

Why this matters

Declarative principles solve many classic DataOps challenges:

Challenge	Declarative solution
Manual control flow and fragile orchestration	The platform infers dependencies automatically
Different definitions and duplicated transformations	A model-based core ensures a single truth
Lack of traceability	Metadata and lineage are built in
Difficult maintenance	Infrastructure and logic are defined as code
Low adaptability	Changes are handled via model updates

Papakonstantinou et al. (2023)² describe this as a step towards “the rigor of relational algebra” – data engineering is finally approaching the same deterministic precision that relational databases originally had.

Summary

Declarative data platforms are still evolving, but the direction is clear. The major players – Databricks, Snowflake and Google Cloud – are moving towards intent-based processing, where the user defines the desired data state and the system realises it. Over time, this will change how data teams work. Roles will become more modelling-oriented, collaboration tighter, and platforms more self-governing.

We are moving from “data pipelines” to “data guarantees” – from systems that execute instructions to systems that enforce truths.

Declarative data platforms are not about less code, but about better abstractions – about describing intention instead of process.

References and footnotes

Ford, N., Parsons, R., & Kua, P. (2017). Building Evolutionary Architectures: Support Constant Change. O’Reilly Media Inc. ↩︎
Papakonstantinou, Y., et al. (2023). Making Data Engineering Declarative. Conference on Innovative Data Systems Research (CIDR). ↩︎ ↩︎ ↩︎
At least before the introduction of “Assets” and “data-aware scheduling” in Airflow 3.0. ↩︎