Quality and reliability

08.02.2026 | 4 min Read
Tags: #data products, #data quality, #SLO for data, #data observability

Three tiers, a handful of SLOs, and a response that makes people trust the product.

Quality is part of the product promise

When you say something is a data product, you are also saying that others can use it over time — without having to guess about freshness, changes or quality.

That is why this chapter is not about “perfect data”. It is about making the promise clear and operable:

what does “good enough” mean for the named use?
what is measured to detect deviations early?
what happens when deviations occur?

“Good enough” means fit for purpose. Data quality can easily become a discussion without boundaries. In data products, the boundaries should be simple:

Quality = good enough for a named use, for a named customer group.

That makes it possible to be concrete. For many data products, three things matter most:

Freshness: data arrives when it is needed
Stability in semantics/keys/time logic: what you mean by “order”, “customer”, “revenue” does not change without control
Quality signal: customers can see whether the product is “ok”, “degraded” or “stopped”

Tiers: not everything should have the same rigour

If everything has the same requirements, you typically end up with either too much governance — or too little credibility. Three tiers are usually enough to be precise without becoming heavy.

Exploration Data is used to learn and test hypotheses. The point here is to be clear that the promise is limited: frequent changes are acceptable, and you primarily need visibility (status + basic metrics).

Standard Data is used in reporting and ongoing decisions. Here you should be able to promise stable semantics for a defined use, and have a few tests/alarms you actually respond to.

Business-critical Errors carry high cost or high risk. Here the promise must be sharper: clear SLOs, a defined response to deviations and controlled change (which we build on in the next chapter).

SLOs: a few promises you can actually measure and follow up

SLOs work best when you are strict on quantity. Start with 2-4 per data product.

Most data products need no more than these categories:

Freshness (refresh)
Availability (readability)
Quality signal (a few indicators that match the use)

SLOs must be tied to measurable indicators. These are straightforward to get started with:

minutes since last successful refresh (freshness)
proportion of successful read attempts against the product surface (availability)
proportion of null/invalid keys, duplicates on business key or invalid status values (quality signal)

The most important thing is not to choose the “right” metric on the first attempt — the most important thing is that you choose something you actually follow up on.

Response to deviations: stop, degrade or signal

Quality only becomes real when you have a clear response. In practice, three response types suffice:

Stop: Used when the risk of misuse is high, or when errors have major consequences. The product is not published as “ok” if critical gates fail.
Degrade: Used when the product can still be used, but with a clear limitation (for example “stale”, or a known portion of the content has deviations). Status must be visible on the product page.
Signal: Used when risk is low, or during exploration. Deviations are made visible and notified, but without automatically blocking use. The point is that customers should not have to guess about the state.

Tests: a few that match the use beat many generic ones

Start with 3-7 validations that match the actual use of the product surface. Typically:

mandatory fields
unique keys (where uniqueness is expected)
valid values (status/enum)
referential integrity (where relevant)
one or two business rules that frequently cause errors or misunderstandings

If you do not have a defined response to a failing test, that is a signal that the test is either poorly chosen — or that the threshold/responsibility is unclear.

Minimum for operability: a short runbook

For data products shared across teams, a short runbook is useful. It can be simple:

point of contact + responsible team
what does “stopped” and “degraded” mean for this product?
most common failure sources (upstream/transform/access)
where status and changelog are updated

It need not be more than that. The purpose is that both the owner team and customers get a predictable handling process.

In practice: product page, catalogue and components

Change, versioning and deprecation practice

Magne Bakkeli

Magne has over 20 years of experience as an advisor, architect and project manager in data & analytics, and has a strong understanding of both business and technical challenges.