The Data Warehouse Monolith

03.06.2024 | 4 min Read
Category: Data Strategy | Tags: #Podcast, #Architecture, #Data Platform

The transition from traditional data warehouses to modern data platforms has become important for many organisations. This article provides an introduction to how you can implement a domain-based data architecture.

Challenges with monolithic data warehouses

A data warehouse is a way of storing data for reporting and analytics. Bill Inmon, the father of the data warehouse, described it as a subject-oriented, integrated, time-variant and non-volatile collection of data to support management decisions. Traditionally, data warehouses have been monoliths – large, complex structures that consolidate all data into a single source.

Over time, the monolith becomes extremely complex, and the dependencies between different data areas slow down the pace of development. Many data warehouse teams spend most of their time on maintenance and bug fixing, which creates a rigid structure that does not adapt to changing needs and technology.

Why did it end up this way? Well, the centralisation of development resources is itself a key explanation. When there is more and more to maintain and manage, the development pace naturally decreases unless more resources are added. And centralised teams with limited time often do not have the capacity to support all user communities equally well.

Transition to a domain-based architecture

To address these challenges, many organisations are breaking the data warehouse monolith into smaller, manageable data domains as they move to cloud-based data platforms.

Each domain represents a specific business area, such as sales or HR, and can be developed and maintained independently of other domains. This reduces dependencies and makes it easier to understand and modify the data.

In practice, we separate the domains logically within the data, set up separate projects/repos, use role-based access control, and define interfaces in the form of views and similar constructs that other domains can connect to. Not least – we define ownership and areas of responsibility both from a business and a technical perspective.

Independent data domains
Independent data domains

Benefits of independent data domains

A modern data platform incorporates best practices from platform engineering and DevOps. It focuses not only on data storage and modelling, but also on the entire ecosystem, from infrastructure to the production deployment of reports and analytical models.

In larger organisations, a domain-divided data architecture can be managed by separate teams that share a common infrastructure in the form of a data platform. It then becomes important that the data platform is managed independently of the domains, by a dedicated platform team.

This approach offers several benefits:

  • Increased flexibility: Smaller units make it easier to adapt the data domain as needed.
  • Faster development: Independent domains reduce complexity and increase the pace of development.
  • Improved organisation: Data is organised more effectively, with clear ownership and responsibility within each domain – and the team sits close to its users and their business needs.
  • Better scalability: The system scales more effectively, both technologically and organisationally.

Implementing a domain-based data architecture

The transition from a monolithic structure to a domain-based model requires careful planning and involvement of all relevant parties.

Here are some steps to get started:

  1. Create a shared goal: Formulate clear objectives and engage the organisation around the benefits of domain-based architecture.
  2. Start with one domain: Choose one domain as a pilot project. Build an MVP (Minimum Viable Product) and gather experience. Remember to establish operational guidelines at the same time!
  3. Expand gradually: Use the experience from the pilot project to expand to more domains. Continuously adjust and improve the processes.
  4. Involve all parties: Ensure that all relevant teams are involved from the start, including key users, data engineers, data scientists and managers.
  5. Communicate the experience: Share the experience from the new domain-based approach with those still working on the old data warehouse monolith. This can help facilitate a smoother transition.

With a clear goal, incremental implementation and involvement, the transition from a monolithic data warehouse to a domain-divided data platform can contribute to increased flexibility, faster development and better utilisation of data.

Want to learn more?

This article from 2021 by Piethein Strengholt provides a practical description of how you can get started with data domains.

Also feel free to listen to the podcast “Datautforskerne”, episode 4, where Eystein Kleivenes and Magne Bakkeli discuss the data warehouse monolith. The episode is available on Spotify, Apple and Acast.

Like and subscribe!

author image

Magne Bakkeli

Magne has over 20 years of experience as an advisor, architect and project manager in data & analytics, and has a strong understanding of both business and technical challenges.