
Rulebook and portfolio
08.02.2026 | 7 min ReadTags: #data products, #data mesh, #data governance, #data product owner
Keep the word 'data product' useful: product vs component, lifecycle, domains and ownership.
Rulebook: What we mean when we say “data product”
The term loses value when everything is called a product. The rulebook below is designed to keep the word data product practically useful: few products with high signal value, many components without false promises.
Ten rules that separate product from component
1 No governance –> no product status. If you do not have the capacity to uphold a promise over time, “product” is just an expensive label.
2 The owner must be named — and have decision-making authority. “Owner” means: who can decide on definitions, prioritisation and change — and who responds when something breaks?
3 The data product must have named customers. If the target audience is “everyone”, it is a sign that you do not know who uses it.
4 The data product must have a clear use case. Not “for analytics”, but “to make X possible for Y”.
5 The promise must be clear enough that others dare to build on it. Access, refresh, quality signals and change practices are usually sufficient.
6 Product surface first, implementation second. What you promise is the interface towards usage. The engine room should be refactorable without everything becoming an incident.
7 Critical dependencies are part of the responsibility. If the promise depends on pipelines, reference tables or quality jobs, the product team must own the consequences, even if the implementation may reside in several places.
8 Quality means “good enough for use”. A few tests that target the actual usage beat many generic tests you never act on.
9 Usage must be visible. If you do not know what is being used, you cannot prioritise, govern or deprecate in an orderly way either.
10 Lifecycle must be defined. “Pilot”, “active”, “deprecating” and “retired” is a simple way to prevent everything from lingering forever.
“Team” and “owner” does not mean a Slack channel
Product status only makes sense when you can point to four things — briefly, concretely and without hero stories:
- Mandate: who makes decisions about definitions and change?
- Capacity: who has the capacity for incidents and requests?
- Contact point: where do customers get answers without being dependent on a specific person?
- Run cost: who owns the prioritisation of operations/compute/support? (It is sufficient to own the prioritisation. You do not need to charge back every query.)
If you do not have this, it is better to call the deliverable a component for now.
Lifecycle: make status visible
Product vs component is about what something is. Lifecycle is about how safe it is to build on right now, and what happens when it changes. Without a visible lifecycle status, the catalogue quickly becomes a list of “things that exist”, and sharing becomes person-dependent: people ask in private messages whether this is supported, whether the definitions are stable, or whether it is on its way out.
A simple status on the product page is enough to make the catalogue more truthful:
- Draft – idea/work in progress, no promises
- Pilot – a few customers, limited promise
- Active – the product promise applies
- Deprecating – replacement is defined, deadline and migration are described
- Retired / downgraded – no longer a product (component or removed)
The point is not to introduce a governance regime. The point is to make expectations explicit before someone builds two quarters’ worth of logic on something you were actually planning to discard. Fewer surprises, clearer prioritisation, tidier clean-up.
Three tiers that make the portfolio more pragmatic
In practice, you need a language for three types of deliverables:
- Data product: managed product surface with customers, a promise and a lifecycle
- Component: building block (table, model, pipeline) without a product promise
- Experiment: temporary deliverable for learning, can be upgraded later
The litmus test: If you remove the deliverable tomorrow, do you know who would miss it? Do they know who to call?
Portfolio management: product, component or experiment?
Once you have a language, you can manage the portfolio. Two steps that deliver the most impact per calorie:
- Give product status to the few things that are actually important and shared.
- Give component status to the rest — but make ownership and purpose visible.
Data product or component? A simple decision table
| Situation | Data product when… | Component when… |
|---|---|---|
| Reuse | multiple teams build on it | one team uses it locally |
| Risk of failure | failure is costly (money/compliance/governance) | failure is mostly annoying |
| Change | change must be handled in a controlled manner | change tolerates more ad hoc |
| Customers/value | you can point to customers and purpose | “maybe someone needs this” |
| Governance capacity | you can uphold a promise | you do not have the capacity (and that is ok) |
Components: minimum practice
Minimum practices for components:
- which domain it belongs to
- who owns it (team) and contact point
- one sentence about purpose
- classification (especially for sensitivity/PII)
- simple lifecycle status (active / under change / being phased out)
- lineage/dependencies (roughly)
When there are named customers who would complain if this changed without notice, you are in data product territory.
Ownership and domains: who decides, and who governs?
This is where many “data product” initiatives fall short: people agree on a definition, but not on who will stand behind it when everyday reality kicks in.
What do we mean by “domain” here?
A domain is an area where someone has the mandate to define concepts and rules, because they own the process and the consequences.
This does not mean the domain map must be perfect before you start. It means you need an address for disagreement.
Data ownership: the part you cannot avoid
Two things that are often conflated:
Business ownership of semantics and rules Who decides the definition when there is disagreement?
Product/operational ownership of the product surface Who manages the promise to the customers: access, quality signals, change and support?
This can be the same team or split, but the responsibility must be clear. Otherwise you get a lot of “we thought you owned it”.
Platform team vs product team
- The platform team delivers standards and “golden paths” (build, test, deploy, catalogue, access, observability).
- The product team owns semantics, product surface, prioritisation and change.
- The domain owner/business shows up when definitions need to be settled.
Scale: how many data products can you sustain?
As many as you can manage with quality, with customers, a promise and a team that responds.
The number of data products should not be driven by the number of tables, but by the number of product surfaces you have the capacity to keep stable over time.
- Too few –> monolith: one large deliverable nobody dares to change
- Too many –> low signal value: hard to find “the recommended one”
The latter is more common than the former. The result typically looks like this: 1,000 entries in the catalogue, 13 real products — and nobody can find them, because the other 987 components claim to be the same thing. The catalogue gives zero confidence, and people ask in private messages instead.
Choose 1-2 usage signals and start simply: consumption in BI/semantic layer, query/access logs on product surfaces, number of named consumer environments in the catalogue/README, or upstream/downstream in pipelines.
Typology: four common data product types
A typology makes it easier to choose the right level of expectations, and to avoid everything ending up in the same “generic dataset” box.
| Type | What it is | Typical customers | Typical product surface | Typical promise |
|---|---|---|---|---|
| A Master and reference data | Identity and references that must be consistent (customer, product, org, calendar) | Many domains | Table/view + history, optionally API | Stable identity + controlled change |
| B Domain foundation (events/facts) | Orders, payments, measurements, returns – with time logic | Multiple teams | Table/view, event feed, API | Semantics + keys + time logic |
| C Use-case-oriented data products | Built for a process/end result (impact, planning, risk) | Product/process teams | Dataset, metrics, feature store | Fit-for-use for one consumption surface |
| D External data products | Shared with partners/customers | External | API, export, event feed | Contract, support, strict change management |
In the next chapter we use a Type A product (Customer 360 – Core) as the running example for the business canvas and MVDP: Business canvas and MVDP.
