Reducing Azure Data Factory Costs While Improving Performance

BACKGROUND

Our team was tasked with reviewing cost and performance within the Azure Data Factory (ADF) environment supporting the Enterprise Data Warehouse (EDW).

The original implementation centralized compute activities in ADF dataflows, with daily and hourly pipelines moving large volumes of data from raw to structured zones and into SQL. While functional, the environment showed high compute costs and long job runtimes.

In September 2023, average job durations were:

  • Daily Loads: ~551 minutes (over 9 hours)

  • Hourly Loads: ~299 minutes (almost 5 hours)

At the same time, ADF compute represented ~90% of total monthly costs, making it the primary optimization target

Challenge

The business required us to:

  • Reduce overall compute cost without increasing infrastructure spend.

  • Improve runtime performance to make data more readily available to consumers.

  • Scale effectively with anticipated data growth.

  • Minimize disruption to existing pipelines and the development team.

Constraints included:

  • No radical re-engineering of the solution.

  • Cost-sensitive adjustments recommendations had to decrease, not increase, spend.

  • Preserving referential integrity and business logic already embedded in the ETL.

Solution

We executed a targeted set of recommendations across database tuning, process optimization, and infrastructure alignment.

1. Database & Process Tuning

  • Primary Key Optimization: Enabled OPTIMIZE_FOR_SEQUENTIAL_KEY on clustered indexes. This reduced insert contention, cutting ~1 minute from a 10M-row load.

  • Foreign Key Handling: Deferred FK enforcement until after full load completion, removing constraint overhead during inserts.

  • Index Management: Retained existing disable/rebuild cycle for non-clustered indexes.

2. Large Process Optimization

We re-engineered the heaviest raw-to-structured pipelines, beginning with PL_raw_to_structured_deductible.

Key changes included:

  • Shifted compute from Azure IR to Self-Hosted IR (SHIR), reusing idle clusters and avoiding Azure IR startup costs.

  • Introduced Copy Activities with bulk insert locks and Degree of Parallelism (DOP=4).

  • Prototyped Deductible Pipeline Results:

    Runtime reduced from 28m22s → 24m56s (10% faster).

    Azure IR activity time reduced from 28m22s → 20m9s (28% less).

3. Non-Data Movement Activities

  • Moved schema swaps, truncates, and index management to SHIR where possible to further reduce Azure IR vCore usage.

4. Visibility Enhancements

  • Added performance monitoring at job and process-group levels (average runtimes, rolling 30-day views) to highlight creep and ensure early detection of bottlenecks.

5. Concurrency & Scheduling Improvements

  • Introduced wait/suppress logic for raw-to-structured transforms to prevent overruns and wasted extracts, ensuring business-day data availability.

RESULTS

  • Cost Savings: By shifting major workloads from Azure IR to SHIR and reducing compute intensity, we cut the largest cost driver (~90% of subscription spend) significantly.

  • Performance Gains: End-to-end runtime reductions of 10–28% were achieved in large pipelines, with parallelization improving throughput.

  • Reliability: Edge-case failures due to concurrent jobs were reduced, improving data availability within the same business day.

  • Scalability: The framework now supports extension of these optimizations to additional large pipelines, enabling incremental improvements without disruption.

Lessons Learned

  • Targeted tuning delivers outsized results: simple database index adjustments combined with workload reallocation yielded double-digit performance gains.

  • Self-Hosted IR is a powerful lever: when used correctly, it provides cost-effective compute and avoids the hidden costs of Azure IR vCore uptime.

  • Visibility prevents regressions: ongoing performance reporting allows the team to spot inefficiencies early.

  • Incremental adoption is key: rolling out changes pipeline-by-pipeline minimized risk while capturing savings steadily.

© 2025 Macer Consulting • All Rights Reserved.