Analysing JSON in Synapse, BigQuery, Databricks and Snowflake

03.05.2022 | 1 min Read
Category: Data Engineering

A practical comparison of schema-on-read support for JSON in popular data processing engines

Whether you are building a cloud data warehouse or a data lakehouse, you will end up with many JSON files in the data lake, as this format is frequently used for log data and as output from various APIs.

Knowing how to analyse these semi-structured files quickly and efficiently with SQL is absolutely essential for anyone working with data.

However, since all the popular data processing engines have their own variations of how they have implemented their SQL dialect, the approach differs somewhat when working with JSON files.

We have published an article on Medium that attempts to provide a practical comparison of how the same information can be extracted through a selection of popular tools – Synapse Serverless SQL pool, BigQuery, Databricks SQL and Snowflake.

Here you will see that whilst Databricks SQL and Snowflake have modern and straightforward support for JSON, it is somewhat more cumbersome in Synapse and BigQuery (for now!).