
Azure Data Factory's orchestration problem
09.06.2023 | 2 min ReadCategory: Data Engineering
Azure Data Factory has an inherent limitation in how it runs activities in parallel, which is also not resolved by Fabric. The solution is to use a dedicated orchestration service such as Azure's recently introduced *Managed Airflow*
Azure Data Factory (ADF) has a weakness in how it handles parallelisation in “ForEach loops”, which can lead to significant time delays and inefficiency for data platforms of a certain scale. Since ADF is an integral part of Microsoft Fabric, this is an important limitation to be aware of.
The article published on Medium is linked at the bottom here, with a brief summary first.
In the article, I argue that the use of a dedicated orchestration service such as Airflow, especially in light of Azure’s recently introduced “Managed Airflow”, can improve performance and solve this problem. With concrete examples, it can be seen that when you run ADF and Airflow together, you achieve runtimes that are consistent regardless of random factors such as the order of tasks in ADF. Despite being slightly slower than ADF’s optimal run, it is significantly faster than the least optimal case.
In addition to solving this specific challenge, Airflow also offers a range of connectors and expanded capabilities, such as using Python anywhere in an orchestration flow. My conclusion is that there is definitely a need for a dedicated orchestration tool such as Managed Airflow in Azure.

