How To: Source-to-Target Mapping

Martyn Rhisiart Jones – Madrid 2nd February 2026

Source-to-Target Mapping: The Quiet Backbone of Any Serious Data Warehouse Effort

Let’s be brutally honest: most data warehouse projects don’t collapse because of bad code, slow queries, or even eye-watering cloud bills. They fail quietly because no one has defined exactly how data should move from sometimes messy source systems to clean, eloquent and trustworthy targets. Source-to-target mapping (STM) is the unglamorous yet powerful discipline that prevents that slow death. Ignore it at your peril.

I’ve spent decades watching organisations pour millions into marvellous new platforms only to end up with dashboards that nobody trusts and reports that nobody reviews. The difference between those disasters and the rare successes almost always boils down to one thing: whether someone took the time to map every bloody field properly, with rules, rationale, and traceability.

What Source-to-Target Mapping Actually Is

Forget the buzzword salad. STM is simply the documented instruction set that says:

  • This field from Table A in the ancient ERP comes from here.
  • It means this to X and this to Y.
  • It gets cleaned like this.
  • It lands in that fact table column over there.
  • It gets integrated these ways.
  • And here’s why we did it that way (business rule, compliance requirement, performance hack—whatever).

A decent mapping includes:

  • Source system, table, column, data type, cardinality, constraints, sample values (yes, samples, because theory dies fast when you see real data).
  • Target schema, table, column, grain.
  • Every transformation: CASE statements, lookups, aggregations, null handling, type casts.
  • Business logic: filters, derivations, slowly changing dimension behaviour.
  • Quality notes: expected null rates, uniqueness, reconciliation points.

In a new data mart you’re building one focused star schema? The mapping is tight and subject-specific. In a DW iteration you’re expanding the enterprise layer? It had better respect conformed dimensions and existing lineage or you’ll pay later in rework and broken reports.Why Most Teams Still Get It WrongBecause it’s tedious. Because “we’ll sort it in the code.” Because the business analyst thinks the engineer will figure it out, and the engineer thinks the analyst already did. Result: tribal knowledge, untestable pipelines, and six months later the finance director is screaming that last quarter’s numbers are fantasy.Proper STM forces clarity early. It becomes the single source of truth for what the data means, not just where it sits.The Process – No Nonsense Edition

  1. Start with Why – and never stop asking why.
    Nail the business questions first. What KPIs? What grain do they actually need? Don’t map everything—map what matters.
  2. Profile the Bloody Sources (Don’t blow the bloody doors off)
    Run queries, do data profiling, and do it like you mean it. Look at nulls, duplicates, orphans, and cardinality. Tools like Great Expectations or even plain SQL will tell you more truth in ten minutes than weeks of meetings.
  3. Design the Target (Then Map Back)
    Grain first. Then dimensions (conformed where possible). Then facts. Reverse-engineer from target to source, it’s usually cleaner.
    • Data granularity refers to the level of detail or resolution at which data is stored, measured, and analyzed.In simple terms:
      • High granularity = very detailed / fine-grained (e.g., individual transactions recorded every second for every customer click).
      • Low granularity = summarized / coarse-grained (e.g., total sales per month per product category).
  4. Map Field-by-Field
    Use a template. Column for column. Rule for rule. Include exceptions and edge cases. Version it. Don’t let anything escape.
  5. Walk It Through
    Sit business SMEs and engineers in the same room (virtual or otherwise). Walk the mappings line by line. They’ll hate you for the first hour, thank you for the next five years.
  6. Test Against It
    Every reconciliation script, every unit test, every UAT case should reference the mapping. If it doesn’t match, something’s broken.
  7. Keep It Alive
    Source changes? Update the map. New requirement? Extend it. Put it in Git, Confluence, or your metadata catalogue—not buried in some Excel file from 2023.

In agile DW work, do this per increment. One subject area, one solid mapping, one delivered mart. Repeat.Tools and Support Worth Using in 2026

  • Low-friction starters: Google Sheets or Excel templates (yes, still, the oldies are still the goldies).
  • Modern ELT reality: dbt docs + YAML for transformations (the mapping lives with the code).
  • Governance layer: Collibra, Alation, or Snowflake’s own lineage views.
  • Profiling & quality: Great Expectations, Informatica, Soda, or native cloud profilers.
  • Lineage visualisation: Atlan, Manta, or open-source alternatives.

AI assistants can draft initial mappings from schema DDL these days—useful for speed, dangerous if you trust them blindly.

The Bottom Line

Source-to-target mapping isn’t sexy. It won’t get you likes on LinkedIn. But it is the difference between a data platform people actually use and another expensive science project gathering digital dust.

Do the work. Map it properly. Test against it ruthlessly. And watch how fast trust builds when the numbers finally stop lying.

If you’re about to kick off a new iteration or mart and haven’t got a decent STM process yet, stop everything else.

Start there.

The rest is just execution. – Martyn Rhisiart Jones
Segovia, 2 February 2026