Rulebook and portfolio

08.02.2026 | 8 min Read
Tags: #data products, #data mesh, #data governance

Keep the word 'data product' useful: product vs component, lifecycle, domains and ownership.

Rulebook: What we mean when we say “data product”

The term loses value when everything is called a product. You end up with a catalogue that looks impressive but does not help anyone make safe decisions.

The classic spiral looks like this:

the catalogue grows large, but little is safe to build on
people create copies “just in case”
change becomes disruptive, because nobody knows who actually owns what

The rulebook below is designed to keep the word data product practically useful: few products with high signal value, many components without false promises.

Ten rules that separate product from component

1 No governance –> no product status. If you do not have the capacity to uphold a promise over time, “product” is just an expensive label.

2 The owner must be named — and have decision-making authority. “Owner” means: who can decide on definitions, prioritisation and change — and who responds when something breaks?

3 The data product must have named customers. If the target audience is “everyone”, it is often a sign that you do not know who actually uses it.

4 The data product must have a clear use case. Not “for analytics”, but “to make X possible for Y”.

5 The promise must be clear enough that others dare to build on it. Access, refresh, quality signals and change practices are often sufficient.

6 Product surface first, implementation second. What you promise is the interface towards usage. The engine room should be refactorable without everything becoming an incident.

7 Critical dependencies are part of the responsibility. If the promise depends on pipelines, reference tables or quality jobs, the product team must own the consequences — even if the implementation may reside in several places.

8 Quality means “good enough for use”. A few tests that target the actual usage beat many generic tests you never act on.

9 Usage must be visible. If you do not know what is being used, you cannot prioritise, govern or deprecate in an orderly way either.

10 Lifecycle must be defined. “Pilot”, “active”, “deprecating” and “retired” is a simple way to prevent everything from lingering forever.

Here is a small illustration you can print out and frame. And perhaps hang in the hallway at home — or at the office…

“Team” and “owner” does not mean a Slack channel

Product status only makes sense when you can point to four things — briefly, concretely and without hero stories:

Mandate: who makes decisions about definitions and change?
Capacity: who actually has time for incidents and requests?
Contact point: where do customers get answers without being dependent on a specific person?
Run cost: who owns the prioritisation of operations/compute/support? (It is sufficient to own the prioritisation. You do not need to charge back every query.)

If you do not have this, it is better to call the deliverable a component for now.

Lifecycle: make status visible in the catalogue

A simple status on the product page is often enough to make the catalogue more truthful:

Draft – idea/work in progress, no promises
Pilot – a few customers, limited promise
Active – the product promise applies
Deprecating – replacement is defined, deadline and migration are described
Retired / downgraded – no longer a product (component or removed)

The point is not to be “process-mature”. The point is to make expectations explicit before someone builds two quarters’ worth of logic on something you were actually planning to discard.

Three tiers that make the portfolio more pragmatic

In practice, you need a language for three types of deliverables:

Data product: managed product surface with customers, a promise and a lifecycle
Component: building block (table, model, pipeline) without a product promise
Experiment: temporary deliverable for learning, can be upgraded later

The litmus test is simple: If you remove the deliverable tomorrow — do you know who would miss it, and do they know who to call?

Product portfolio in practice: what is a product, what is “inside”, and what does it cost you?

Once you have a language (product/component/experiment), you can also manage the portfolio. Here are two practical steps that often deliver the most impact per calorie:

Give product status to the few things that are actually important and shared.
Give component status to the rest — but make ownership and purpose visible.

Data product or component? A simple decision table

Situation	Data product when…	Component when…
Reuse	multiple teams build on it	one team uses it locally
Risk of failure	failure is costly (money/compliance/governance)	failure is mostly annoying
Change	change must be handled in a controlled manner	change tolerates more ad hoc
Customers/value	you can point to customers and purpose	“maybe someone needs this”
Governance capacity	you can uphold a promise	you do not have the capacity (and that is ok)

Data that is not a product: components, domain responsibility and lightweight governance

Not everything should be managed as a data product. But everything that is used (and everything that can be misunderstood) needs a minimum of order — otherwise “self-service” becomes a social experiment.

Components are building blocks: tables, models, intermediate layers, pipelines. They can be critical, but they do not have a promise you market as a “stable product surface”.

Minimum practices for components:

which domain it belongs to
who owns it (team) and contact point
one sentence about purpose
classification (especially for sensitivity/PII)
simple lifecycle status (active / under change / being phased out)
lineage/dependencies (roughly)

When there are named customers who would complain if this changed without notice, you are in data product territory.

Ownership and domains: who decides, and who governs?

This is where many “data product” initiatives fall short: people agree on a definition, but not on who will stand behind it when everyday reality kicks in.

What do we mean by “domain” here?

A domain is an area where someone has the mandate to define concepts and rules, because they own the process and the consequences.

This does not mean the domain map must be perfect before you start. It means you need an address for disagreement.

Data ownership: the part you cannot avoid

Two things that are often conflated:

Business ownership of semantics and rules Who decides the definition when there is disagreement?
Product/operational ownership of the product surface Who manages the promise to the customers: access, quality signals, change and support?

This can be the same team or split, but the responsibility must be clear. Otherwise you get a lot of “we thought you owned it”.

Platform team vs product team

The platform team delivers standards and “golden paths” (build, test, deploy, catalogue, access, observability).
The product team owns semantics, product surface, prioritisation and change.
The domain owner/business shows up when definitions need to be settled.

Scale: how many data products can you sustain?

As many as you can actually manage with quality — with customers, a promise and a team that responds.

The number of data products should not be driven by the number of tables, but by the number of product surfaces you have the capacity to keep stable over time.

Too few –> monolith: one large deliverable nobody dares to change
Too many –> low signal value: hard to find “the recommended one”

Three categories for governance

Data product: managed product surface with a promise
Component: useful building block without a product promise
Candidate: could become a data product with increased usage or higher risk

Few data components become data products

Usage must be visible

Choose 1-2 usage signals and start simply:

consumption in BI/semantic layer
query/access logs on product surfaces
number of named consumer environments in the catalogue/README
upstream/downstream in pipelines

Lifecycle: make expectations visible

Product vs component is about what something is. Lifecycle is about how safe it is to build on right now — and what happens when it changes. Without a visible lifecycle, the catalogue quickly becomes a list of “things that exist”, and sharing becomes person-dependent: people ask in private messages whether this is supported, whether the definitions are stable, or whether it is on its way out.

A simple lifecycle makes it possible to be honest without being heavy:

Candidate means “no promises”
Pilot means “a few customers and short feedback loops”
Active means “the product promise applies”
Deprecating means “we have a replacement, a deadline and a transition”.

The point is not as many statuses as possible — the point is fewer surprises, clearer prioritisation and tidier clean-up.

Example of a data product lifecycle with statuses

Typology: four common data product types

A typology makes it easier to choose the right level of expectations (and to prevent everything from ending up in the same “generic dataset” box).

Type	What it is	Typical customers	Typical product surface	Typical promise
A Master and reference data	Identity and references that must be consistent (customer, product, org, calendar)	Many domains	Table/view + history, optionally API	Stable identity + controlled change
B Domain foundation (events/facts)	Orders, payments, measurements, returns – with time logic	Multiple teams	Table/view, event feed, API	Semantics + keys + time logic
C Use-case-oriented data products	Built for a process/end result (impact, planning, risk)	Product/process teams	Dataset, metrics, feature store	Fit-for-use for one consumption surface
D External data products	Shared with partners/customers	External	API, export, event feed	Contract, support, strict change management

What is a data product?

Business canvas and MVDP

Magne Bakkeli

Magne has over 20 years of experience as an advisor, architect and project manager in data & analytics, and has a strong understanding of both business and technical challenges.