These are 15 books I think you should check out if you work in data and analytics. You do not necessarily have to take my word for it, but it is well-intentioned. I have read many of these books, but not all. Here are my book recommendations for data and analytics as of July 2023.
Book recommendations – an overview of my favourites on data and analytics Data in the world Data organisation Problem solving Data governance Data architecture Data warehousing and modelling Development principles Data in the world Maskiner som tenker What is the book about? Wondering how the technology behind artificial intelligence works? Are you curious about the challenges it presents? Then you should put this book on your reading list. First published in 2023.
Why should you read this? This book is suitable for everyone, not just professionals in data and analytics. Inga writes entertainingly and pedagogically about a topic that can easily become dull and academic. After reading the book, you will understand much more about both how AI works and what you should consider in order to use AI correctly.
Inga Strümpke (2023): Maskiner som tenker Data organisation Data Mesh What is the book about? Data Mesh is a new approach to data architecture and organisation that aims to help organisations leverage data in a more effective and scalable way. Zhamak Dehghani describes the philosophy and how it can be implemented. First published in 2022.
Why should you read this? This book should be read by professionals in data and analytics, especially if you are a Chief Data Officer or in a similar role. After reading the book, you will gain an understanding of data mesh as a concept, and perhaps inspiration to implement parts of it. Product orientation, decentralisation and data platform teams are aspects I appreciate greatly. There are aspects that are more demanding, because not everything is about data, but Dehghani makes it seem that way…
Zhamak Dehghani (2023): Data Mesh Team Topologies What is the book about? Where Data Mesh addressed data organisation, Team Topologies is about organising teams more generally. Stream-aligned teams, enabling teams, complicated subsystem teams and platform teams, along with various interaction modes, are explained well. Read this BEFORE you read Data Mesh.
Why should you read this? This book is already a classic! It should be read by professionals in data and analytics, especially if you are a Chief Data Officer or in a similar role. After reading the book, you will gain an understanding of how data teams should work – especially that business objectives should guide the organisational structure you choose. Stream-aligned teams, enabling teams, complicated subsystem teams and platform teams provide inspiration for, for example, domain teams in data, and data platform teams.
Matthew Skelton & Manuel Pais (2019): Team Topologies The Chief Data Officer Management Handbook What is the book about? The Chief Data Officer Management Handbook aims to provide frameworks, practical examples and do’s and don’ts for setting up an effective data organisation. Published in 2020.
Why should you read this? It should be read by Chief Data Officers and data-interested leaders, to provide a framework for working purposefully so that the organisation can achieve increased value creation through data. Note: I have not read this one myself, but I have been looking for such a book for some years. Have you read it? Leave a comment about what you liked or did not like!
Martin Treder (2020): The Chief Data Officer Management Handbook Product Management in Practice What is the book about? Product Management in Practice provides useful examples and practical advice for being a good product owner. Updated in 2022.
Why should you read this? It should be read by Chief Data Officers, product owners and business analysts. The book is not technical. It provides a good overview of how to develop products that deliver value, and how to manage them in the best possible way. And why is this relevant? Because data models, machine learning algorithms and datasets can be viewed as data products. And then we should learn how to manage products.
Matt Lemay (2022): Product Management in Practice Problem solving The Model Thinker What is the book about? How can you leverage model thinking and data to make better decisions? This book attempts to provide the answer. Published in 2018.
Why should you read this? It should primarily be read by business analysts and BI developers, but also others who are curious about data-driven decision-making processes. The book is not technical at all. It provides a good overview of data-driven problem solving. Note: I have only skimmed this one, but it seems good. I would appreciate your feedback if you have read it thoroughly!
Scott E. Page (2018): The Model Thinker Data governance Data Governance - The Definitive Guide What is the book about? Without good data, all types of data products are underused and deliver little value. How do you get good data? Through effective Data Governance. As this book discusses extensively: people, processes and tools must work together to create data trust. First published in 2021.
Why should you read this? This book should be read by Chief Data Officers, those who work in data governance, and architects in data and analytics. After reading the book, you will gain a solid overview of data governance as a discipline. The book is primarily oriented towards analytical use of data, some say, but I assume the principles are still sound. I have not read this one myself, but the reviews are positive and the book is recent. So I am taking the chance that it is good.
Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy, Jessi Ashdown (2021): Data Governance - The Definitive Guide The Enterprise Data Catalog What is the book about? Many organisations have much left to do in data governance – few have good control over their data yet. The data catalogue has received a lot of attention over the past 2-3 years. Ole Olesen-Bagneux seeks to provide a guide on how a data catalogue should be implemented to deliver value. First published in 2023.
Why should you read this? This book should be read by Chief Data Officers and everyone who works with Data Governance and who is looking for a better overview of how data catalogue capabilities should be leveraged. I have not opened this one myself yet, so please let me know if you have formed an impression!
Ole Olesen-Bagneux (2023): The Enterprise Data Catalog Data architecture Building Evolutionary Architectures What is the book about? How do we build architectures that can evolve in step with business needs? Updated in 2022.
Why should you read this? This book is suitable for everyone who works with development. We particularly like the foreword by Martin Fowler: “For a long time, the software industry followed the notion that architecture was something that ought to be developed and completed before writing the first line of code. Inspired by the construction industry, it was felt that the sign of a successful software architecture was something that didn’t need to change during development, often a reaction to the high costs of scrap and rework that would occur due to a re-architecture event.”
So – be inspired to build something that can be evolved further. Save your employer millions.
Rebecca Parsons, Patrick Kua, Neal Ford, Pramod Sadalage (2022): Building Evolutionary Architectures Data Management at Scale What is the book about? Data architecture is changing rapidly. We have gone from data warehouses, to data lakes, to data lakehouses. We have gone from ETL to ELT, and onwards to streaming. And much more. Piethold Strengholt provides blueprints, principles and patterns. First published in 2021 and updated in 2023.
Why should you read this? This book should be read by architects in data and analytics. After reading the book, you will have a better overview of architecture patterns in data and analytics, which are more technology-agnostic than those you can read about on, for example, Microsoft’s websites. The book is a bit dry if you are not in the middle of an architecture evaluation. My tip: skim it, so you know what you can read in detail when the need arises. Because it will.
Piethold Strengholt (2023): Data Management at Scale The Fundamentals of Data Engineering What is the book about? You guessed it: data engineering. Architecture, technologies, end-to-end flow and work processes are covered well. First published in 2022.
Why should you read this? It should be read by data engineers and BI developers who regularly work upstream in the data flow. The book provides a good overview of the landscape, but you will probably need to use Google if you are going to implement specific solutions.
Joe Reis, Matt Housley (2022): The Fundamentals of Data Engineering Designing data-intensive applications What is the book about? Designing Data-Intensive Applications has become a bestseller, and covers how to build software – including data-driven solutions – that scales, is secure, etc. Published in 2017.
Why should you read this? It should be read by data engineers and developers in general, who are building scalable data solutions. The book provides a solid overview of architecture without delving into specific code. The principles and methods described in the book are largely universal and are also applicable to data-driven solutions.
Martin Kleppmann (2017): Designing data-intensive applications Data warehousing and modelling What is the book about? The Data Warehouse Lifecycle Toolkit provides frameworks and principles for effectively building data warehouses. Updated in 2008.
Why should you read this? It should be read by data engineers and information architects. The book provides a good overview of methods for building data warehouses and is considered a classic. The technologies have changed, but the principles still hold up well. Now that we have done it a few times, we mostly use the book as a reference.
Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy, Bob Becker (2008): The Data Warehouse Lifecycle Toolkit What is the book about? Data modelling – or more specifically, dimensional modelling. It is written by one of the originators of the data warehouse concept – Ralph Kimball – together with Margy Ross. This book was updated in 2013, but the principles still apply.
Why should you read this? It should be read by data engineers and information architects. I would probably use it as a reference rather than reading it cover to cover. It is, after all, thick. So am I. But that is a different story. I do not know of any other books on this topic that are as comprehensive, and I have seen various editions of this one everywhere I have worked.
Margy Ross, Ralph Kimball (2013): The Data Warehouse Toolkit Development principles The Pragmatic Programmer What is the book about? The Pragmatic Programmer: Your Journey To Mastery describes a philosophy for modern programming and development. It covers a range of topics from use cases and architecture to learning and career. Updated in 2019.
Why should you read this? It should be read by data engineers and developers in general. The book provides advice on how to become an effective developer. And why is this relevant in data and analytics? Because data platforms, data transformations and machine learning algorithms are code. Code should be developed properly – and the principles described in the book are universally applicable.
David Thomas, Andrew Hunt (2019): The Pragmatic Programmer Feel free to send me more recommendations!