Magne Bakkeli is co-founder and senior advisor at Glitni. He has over 25 years of experience in data platforms, data governance and data architecture, and led the Data & Analytics team at PwC Consulting for 12 years. He has built and modernised data platforms across energy, FMCG, finance and media.

Contents

Data Mesh | A Complete Guide

byMagne Bakkeli

03.05.2023 | 14 min Read
Category: Data Strategy | Tag: #data mesh

In this article, we will explore the Data Mesh concept and the four principles that underpin it. We will also take a closer look at how Data Mesh builds on ideas from software and team organization, as well as consider the benefits and criticism of the concept.

Data mesh is a new approach to data architecture and organization that aims to help organizations leverage data in a more effective and scalable way. The concept was introduced by Zhamak Dehghani through the article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” in 2020, and has since received considerable attention in the data industry. The attention has only grown since the book Data Mesh: Delivering data-driven value at scale was published in 2022.

Data Mesh is not a technology or a product, and it is still evolving. There is no reference architecture at the moment, and implementing it is complex and time-consuming because it requires an interplay between multiple components: ownership, data, organization and technology. Data Mesh can be seen as a target state that we strive to approach over time through deliberate actions that drive maturity.

Was that a bit vague? Read on, and hopefully it will become clearer.

What problems does Data Mesh seek to solve?

Over the past decade, especially with the rise of cloud services, many BI and analytics initiatives have struggled to achieve the benefits they were intended to deliver. Data initiatives may have taken longer or been more challenging than initially anticipated, solutions may not have achieved the necessary quality (especially data quality), and existing problems around data trust and governance may still be causing issues and in some cases have been exacerbated by the transition to the cloud.

As data volumes grow, it becomes harder to store, process and analyze data effectively. Many organizations still use a centralized on-premise data warehouse that is unable to handle the increasing volume of data, leading to delays and reduced ability to perform analytics and reporting. The central development team tries its best to develop new reports, deliver data to data science (which has built its own solutions on the side), and patch everything old. And dissatisfaction grows.

At the same time, becoming data-driven is one of the top strategic goals for many organizations, where they want to make better decisions based on facts and predictions, offer better and more tailored services and products, increase the quality of products and services, give employees a more enjoyable and efficient workday, and not least reduce operating costs.

Data Mesh aims to address several problems that are common in data management and analytics when data volumes and business complexity increase.

Possible causes of the problems

One of the main reasons for the challenges that data initiatives face is that they have largely been technical exercises. The focus on product, platform and feature selection has overshadowed areas such as people, processes and culture. This does not mean that these solutions necessarily have to fail, or that the technical problems are unimportant. But this is why Data Mesh as a concept now resonates so well.

There is often a knowledge gap between IT and business users; IT typically has very strong technical skills but often has less insight into which data business users use, how they use it and when – at least often less insight than they think they have. At the same time, business users often do not know the underlying data, sources and quality as well as they think they do.

Data Mesh to the rescue!

The data mesh approach recognizes that data is a central resource for the organization and that it is necessary to involve all parts of the business to leverage data resources effectively. This means that people, processes and culture must be considered just as important as technology.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-monolith-to-mesh.html) og illustrerer Data Mesh sett fra helikopterperspektiv. — Bildet er hentet fra Martinfowler.com og illustrerer Data Mesh sett fra helikopterperspektiv.

The four principles of Data Mesh

To understand data mesh, it is important to know the four fundamental principles that underpin the concept.

Principle 1: Domain-oriented and business-driven ownership

Instead of centralizing ownership of all data, the Data Mesh philosophy encourages sharing data ownership among the various domains within the organization. This can help reduce bottlenecks and improve collaboration between different teams and departments.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-monolith-to-mesh.html) og illustrerer eksempler på data-domener — Bildet er hentet fra Martinfowler.com og illustrerer eksempler på data-domener

To achieve success in developing data and analytics solutions, collaboration between the business side and the development community is essential. The traditional responsibility model concentrates developer competence and system ownership in one place and business needs in another, which leads to responsibilities falling between two stools, especially when it comes to data ownership.

Dehghani emphasizes the importance of having ownership and development of data and analytics services on the business side, so that this does not become solely an IT responsibility, through Data Mesh. A direct solution is to move responsibility for data and analytics solutions to the business side, called domain-oriented data ownership and architecture.

This means that the responsible leader for each domain takes ownership of business processes, needs, architecture, systems and data, and facilitates data being used by other domains in the organization. The business owner of a domain, for example the Head of Product Development in a manufacturing company, is responsible for their own goals and the framework conditions for delivering on those goals. A data team within product development would in this example be responsible for managing and maintaining their own product data (technical data, specifications, etc.), but also for ensuring that it can be shared with others.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-monolith-to-mesh.html) og illustrerer tverrfunksjonelle team. — Bildet er hentet fra Martinfowler.com og illustrerer tverrfunksjonelle team.

Principle 2: Data as a product

The data mesh concept regards data as a product, which means that data should be easily accessible, understandable and usable for those who need it. This requires a focus on user-friendliness and good data APIs.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-monolith-to-mesh.html) og illustrerer data som produkt — Bildet er hentet fra Martinfowler.com og illustrerer data som produkt

The threshold for leveraging data in many organizations today is quite high, which can be attributed to several factors such as data being difficult to find, understand and access. Additionally, data quality cannot always be trusted. To lower this threshold, data must be offered as a product that is easy to find and simple to use. But what does it actually mean to offer data as a product?

First and foremost, it means taking users and their needs as the starting point. This is not a new way of thinking in the data world, as many of us spend time understanding and addressing user needs today. But the Data Mesh concept takes this a step further: A data product must contain everything necessary for it to be adopted.

This means that it is not sufficient to simply set up some data structures on a data platform and ask people to help themselves. You must also provide necessary descriptions, explanations and tools, such as code, APIs, analysis examples, etc., that can help lower the threshold for use. By offering these elements, you can make it easier for users to understand what the data contains and how it can be used.

The data must also be self-explanatory and inspire trust. If users do not trust the quality, or have to read through outdated or incomplete documentation, they will not use the data. Therefore, it is important to ensure that the data is cleansed of what users perceive as noise and that it is designed in a way that clearly shows how it relates to business processes. In this way, you can increase the likelihood that users will use the data and benefit from what it has to offer.

“Data as a product” and “a data-driven end product” are two related but different concepts.

Data as a product refers to an approach where data is treated as its own product with a focus on user needs. The data is structured, documented and arranged in a way that makes it easy for users to find, understand and use. An example of this is the subject model in a data warehouse, where the data is organized and modeled based on business processes and concepts. Data as a product also includes associated descriptions, explanations and tools such as APIs, code and analysis examples, which help users utilize the data effectively.

A data-driven end product is a concrete product or result that is based on data and analytics. A data-driven end product can be a dataset, a report, a visualization, or an application that helps the organization make decisions, achieve goals or solve problems. Data-driven end products are typically the result of processing and analyzing data that is arranged as a product (i.e. data as a product).

The main difference between the two concepts lies in the focus and purpose. While “data as a product” focuses on making data accessible and easy to use for the organization’s users, “a data-driven end product” is a concrete result or tool created using that data to achieve specific goals or solve problems within the organization.

Principle 3: Federated governance model

Data products within a domain must be managed and developed by a dedicated product team that has the best knowledge of the data. Such a team is composed of developers and people with good knowledge of the data in question, meaning people who are involved in the business processes that create them.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-mesh-principles.html) og illustrerer styringsmodellen tilknyttet Data Mesh. — Bildet er hentet fra Martinfowler.com og illustrerer styringsmodellen tilknyttet Data Mesh.

At the same time, you cannot leave everything to the product teams. For such a system to work, it is necessary that a central function takes responsibility for defining and serving as a sounding board on standards that it is not practical for everyone to invent on their own.

Examples of this can be how metadata about data products is defined, naming conventions for data product interfaces, or classification of data products in accordance with information security and privacy.

The standardization work thus becomes one of the most important (if not the most important) contributions from the central team, which may reside under the Chief Data Officer.

Principle 4: Self-serve data platform

Unlike a traditional, monolithic data platform where the central data team typically shares the same environment, each individual domain team should instead offer their data products in their own logical area of a shared solution that is managed and operated centrally.

Bildet er hentet fra [Martinfowler.com](https://martinfowler.com/articles/data-monolith-to-mesh.html) og illustrerer en felles dataplattform for selvbetjening. — Bildet er hentet fra Martinfowler.com og illustrerer en felles dataplattform for selvbetjening.

The alternative would be to create many competing platforms that do not collaborate well, with countless technologies and architecture patterns. It is unrealistic to expect that each individual domain will provide its own infrastructure in a decentralized model.

Emphasis is placed on having a shared platform for data integration, data storage and transformation, with support for MLOps and DataOps. Tools for analytics and visualization can be part of the shared platform or be more standalone.

In such a solution, it is important to abstract away as much complexity as possible and minimize the work of establishing a new instance, while avoiding the trap of customizing shared solutions for individual team needs.

Note that Data Mesh is not really an architecture concept, even though many believe it is. But the principles often require that technology and architecture be adapted. Such a self-serve, shared platform will for many organizations represent both new technology and a new method for how solutions are offered to end users.

Glitni strongly believes in building a great developer experience for all users of the data platform – and that it requires specialization in DataOps and MLOps to develop and manage these shared capabilities. This is precisely what the Data Mesh concept also emphasizes.

Data Mesh builds on ideas from software and team organization

Data Mesh sounds a bit familiar, you might be thinking. You may be right, because Zhamak Dehghani builds on both principles from software development and team organization. There is nothing wrong with drawing inspiration from and building on others, especially when it is done in a way where all the puzzle pieces that need to be in place are explained well.

Data Mesh draws inspiration from software development

The Data Mesh concept draws much inspiration from IT development, especially from the principles and methods behind microservices, DevOps and agile development methods:

Decentralization: Just as microservices encourage decentralization of application architecture by breaking it into smaller, independent and easily integrable components, Data Mesh emphasizes decentralizing data storage and processing. This means that responsibility for data is shared among multiple autonomous teams, rather than being centralized in a single team or system.
Ownership and accountability: Similar to DevOps and agile methods, which promote ownership and accountability among development teams for the entire lifecycle of a product, Data Mesh encourages data owners and data consumers to collaborate closely. This means that teams are both responsible for the quality and availability of the data they produce, and for how the data is used and integrated in the organization.
Cross-functional teams: Data Mesh supports the principle of cross-functional teams, which is common in agile methods. Each team has both data experts and domain experts working together to ensure that data is adapted to user needs and business objectives. This promotes efficiency and agility in data-driven processes and decision-making.
Evolutionary architecture: Data Mesh draws inspiration from the principles of evolutionary architecture, which means that the system is built to be changeable and adaptable over time. This is important for handling the ever-increasing volume of data and for accommodating changes in the organization’s needs and goals.

The ideas are therefore not entirely new, but they have been adapted and significantly further developed (especially through the book) to suit data and analytics.

Data Mesh also draws inspiration from the book “Team Topologies”

The Data Mesh concept draws inspiration from the book “Team Topologies” by Matthew Skelton and Manuel Pais, which focuses on how organizations can design teams and collaboration to achieve better results and create more effective systems:

Team interaction patterns: The book “Team Topologies” introduces four team interaction patterns: collaboration, X-as-a-Service, facilitation and teaching. Data Mesh uses these patterns to create a collaborative environment where data owners, data consumers and domain experts work together to understand, facilitate and optimize data usage.
Clarity in areas of responsibility: “Team Topologies” emphasizes defining clear areas of responsibility for each team, which is also important in the Data Mesh concept. By assigning data responsibility to specific teams, you ensure that data owners and data consumers know who to collaborate with and what responsibilities they have in connection with data-driven processes.
Platform teams and stream-aligned teams: “Team Topologies” describes platform teams as teams that deliver services and tools to stream-aligned teams, which focus on delivering value to customers or users. Data Mesh incorporates this thinking by having separate teams to handle data infrastructure and data tools (platform teams) and teams that focus on delivering data as a product and creating value from data (stream-aligned teams).
Cognitive load: The book underscores the importance of limiting the cognitive load on team members, so they can focus on their core area. Data Mesh takes this into account by decentralizing data management and letting teams focus on their specific domain and data, which reduces complexity and cognitive load for each team.

Benefits of data mesh

There are several potential benefits of Data Mesh, provided the principles are implemented as a whole and support each other.

Scalability and flexibility: One of the biggest benefits of data mesh is the scalability and flexibility it provides. By spreading data ownership across multiple domains and teams, it can become easier to scale up or down as needed. This can also contribute to a more flexible architecture that can be adapted and expanded as the business grows and changes.
Better data utilization: Data mesh can contribute to increased data utilization by making data more accessible and user-friendly. When data is treated as a product and the organization has a platform approach, it can become easier for employees to find, understand and use the data they need in their work.
Reduced costs: By reducing dependency on centralized data resources and bottlenecks, data mesh can lead to reduced costs related to data management and analytics. This includes costs for infrastructure, maintenance, and personnel.

Criticism of the Data Mesh concept

Criticism of Data Mesh often centers on three things: 1) culture and competence, 2) technology and architecture, and 3) silos and fragmentation.

Cultural and competence changes: Implementing data mesh often requires significant cultural and competence changes in an organization that are MUCH broader than just pertaining to data. Ownership of a domain should indeed include processes, organization, technology/system support AND data. The data dimension cannot and should not be optimized in isolation. Additionally, it will be challenging to get all employees to understand and accept the new principles, and training and competence development may be necessary to ensure that all teams are capable of handling the new responsibilities that come with Data Mesh.
Lack of maturity in technology solutions: Although the data mesh concept has received much attention, it is still relatively new, and there are not always mature technology solutions that support all aspects of data mesh around domain ownership of data and metadata, APIs and data exchange, etc.
Risk of silos and fragmentation: Another concern related to data mesh is that the increased autonomy of the various data teams can lead to silos and fragmentation, especially if clear guidelines and processes for collaboration and data exchange between teams are not established. This can, in the worst case, counteract some of the benefits of data mesh and create new challenges.