Insights · Microsoft Fabric

Data Factory in Microsoft Fabric: What It Does and When to Use It

17 June 20267 min read
Data pipelines flowing between sources in Microsoft Fabric

Data Factory in Microsoft Fabric is the workload that moves and transforms data — the bit between "this lives in our source systems" and "this is queryable in the Lakehouse or Warehouse". If you have used Azure Data Factory (ADF) before, the name is familiar but the product is meaningfully different. If you haven't, treat it as the ETL layer of Fabric. Either way, here is what it actually does and how to use it without getting stung.

What is included

Fabric Data Factory has two main building blocks.

Pipelines are orchestration — the equivalent of ADF pipelines. You drag activities onto a canvas (copy data, run notebook, run stored procedure, wait, conditional, for each) and wire them into a sequence. Pipelines do not transform data themselves; they coordinate other things that do.

Dataflows Gen2 are transformation — the evolution of Power Query online. You connect to a source, write M code (or click through the no-code UI), shape the data, and push the result to a destination such as a Lakehouse table or Warehouse table. Dataflows are where the actual ETL happens.

On top of those, Fabric adds Copy Job for scheduled bulk copies between systems and Mirroring for keeping a near-real-time replica of a source database (Snowflake, Azure SQL, Cosmos DB) in OneLake. Mirroring is the feature that has done most to make Fabric viable for teams who used to use Synapse Link.

How is it different from Azure Data Factory?

Three differences matter in practice.

1. Compute is included in your capacity. ADF charged per pipeline activity, per data integration unit and per data flow vCore hour. Fabric Data Factory uses your existing Fabric capacity. Simpler to budget, but easier to accidentally starve other workloads.

2. The destinations are different. ADF wrote to "wherever you wanted". Fabric Data Factory is happiest writing to OneLake (Lakehouse tables or Warehouse tables). You can still write elsewhere via the copy activity, but the path of least resistance is into Fabric itself.

3. Dataflows Gen2 are first-class. In ADF, Mapping Data Flows and Power Query were second-class citizens next to pipelines. In Fabric, Dataflows Gen2 are the recommended way to do most transformation work that does not warrant a Spark notebook.

Pipelines vs Dataflows vs Notebooks — which to use?

The single most common question we get on Fabric projects. Rough rule of thumb:

Dataflow Gen2 for transformations a business analyst could read. Joining, filtering, reshaping, cleaning, small lookups. Up to a few million rows comfortably; tens of millions with care.

Spark notebook for heavy lifting — large datasets (hundreds of millions of rows), complex joins, anything that needs Python or Scala libraries, machine learning feature engineering. Engineering-grade work.

Stored procedure in Warehouse for set-based SQL transformations where the data already lives in the Warehouse and the team is comfortable in T-SQL. Often the fastest and most maintainable option once data is curated.

Pipeline to glue any of the above together, run them on a schedule, and add control-flow logic (only run step B if step A succeeded; loop over yesterday's missing partitions).

Pitfalls we have seen on real projects

Dataflows scaled past their comfort zone. Dataflows Gen2 are tempting because they are easy to build, but a 50-million-row daily refresh on a complex flow will eat capacity and run slowly. Anything past a few million rows that joins to other big tables belongs in a notebook.

No incremental refresh strategy. By default, copies and dataflows are full refreshes. On big tables this is wasteful and quickly becomes unsustainable. Plan an incremental approach (watermarks, change-data-capture, mirroring) before the table grows past 10 million rows.

Pipelines without alerting. A nightly pipeline that silently fails will eventually be discovered by an executive looking at last week's numbers. Wire pipeline failures to email or Teams from day one — it takes about ten minutes.

Source credentials in every dataflow. Connect once via a connection in the Fabric admin portal, then reuse it. Re-typing credentials into every artefact turns rotation into a multi-day project.

A sensible default architecture

For a typical mid-sized UK organisation starting on Fabric, we usually recommend:

Mirror or pipeline-copy source systems into a "bronze" Lakehouse, raw and untouched. Use a Dataflow Gen2 or notebook per source to produce a cleaned "silver" Lakehouse. Use a Warehouse stored procedure or notebook to build the curated "gold" star schema. Schedule the lot from a single parent pipeline with proper logging and failure alerts. Surface the gold layer to Power BI via DirectLake.

That gets you a credible enterprise-grade pattern without over-engineering the first delivery. From there you can layer in real-time mirroring, machine learning notebooks or whatever else the business asks for.

If you are wiring up Data Factory pipelines for the first time and want a second pair of eyes, our Microsoft Fabric consultancy page explains how we usually run the first phase, and our plain-English Fabric overview sets the wider context.

Frequently asked questions

Is Data Factory in Fabric the same as Azure Data Factory?

Similar concepts but a different product. Fabric Data Factory uses Fabric capacity rather than per-activity pricing, writes naturally into OneLake, and promotes Dataflows Gen2 as a first-class transformation tool.

Should I use Dataflows or notebooks for transformations?

Dataflows Gen2 for analyst-readable transformations on small to mid-sized data. Spark notebooks for heavy engineering work, complex logic and very large datasets.

Can Fabric Data Factory connect to on-premise sources?

Yes, via the on-premises data gateway, the same component used by Power BI. Make sure to size the gateway machine appropriately if you expect parallel transfers.

Want to talk this through with someone?

We are an independent UK Power BI and Microsoft Fabric consultancy. Honest opinions, fair prices, no sales pressure.