Data Lakehouse | What is a Data Lakehouse?

28.08.2022 | 3 min Read
Tag: #data lakehouse

As a concept, a data lakehouse is the lovechild of a data lake and a data warehouse: a data lakehouse is suited for storing and processing all forms of data related to both reporting and analytics.

Data Lakehouse - the best properties from Data Lake and Data Warehouse

A Data Lakehouse is a data architecture that combines the advantages of traditional data warehouses and data lakes. It offers a unified platform for both raw data (structured and unstructured) and modelled data, making it possible to store large amounts of raw data (as in a data lake) while simultaneously performing complex analytical and transactional queries (as in a data warehouse).

A data lakehouse covers multiple needs simultaneously
A data lakehouse covers multiple needs simultaneously

Vendors such as Databricks, Snowflake, Azure, Google Cloud and AWS all enable a comparable architecture, and there have gradually been many implementations worldwide of what can be described as “the modern data stack”.

There is always something new that can be called “modern”. These characteristics are perhaps the most important compared to how data warehouses and data lakes were originally built:

  • Separation of processing and storage
  • Scalability in terms of data volume, users and breadth of supported user stories
  • Modularisation

In addition, an important point is that table formats such as Hudi, Apache Iceberg and Delta Lake enable logical database operations on data lake tables. ACID support means that we can, among other things, both modify and delete data – and we must be able to do so to comply with GDPR requirements.

The common features across implementations so far are the breadth of user stories supported – with the same architecture. Where we previously talked about, for example, a data warehouse, we now talk about data platforms where we can add and remove components and services as needs change. Data platforms are now also primarily built on a cloud service, primarily from Google Cloud, AWS or Azure. Reporting, machine learning/advanced analytics and real-time data are examples of user stories that can be supported by one and the same data platform.

Data platforms built on a data lakehouse architecture are unlikely to be the final destination this time either. New terms and concepts that are considered more modern or better than what was previously dominant will always emerge. There is much to look forward to. If you need assistance navigating the jungle of terminology, do not hesitate to contact us at Glitni!

Advantages and disadvantages of a data lakehouse

Below we summarise some important advantages and disadvantages of using a data lakehouse for storage for reporting and analytics:

Advantages
  • Support for many different types of user stories, both reporting and advanced analytics. Provides processing capabilities, library support for R and Python, as well as good API capabilities
  • Reduced data redundancy because data can be stored only once – both structured and unstructured data
  • Is cost-effective – utilises affordable storage through data lake storage
Disadvantages
  • Risk that technology and architectural patterns are immature – which may mean that choices made will need to be rebuilt later
  • The field has not been sufficiently updated/developed in terms of implementation and DataOps relative to the technical possibilities that have opened up

Learn more



author image

Magne Bakkeli

Magne has over 20 years of experience as an advisor, architect and project manager in data & analytics, and has a strong understanding of both business and technical challenges.