Requirements – Building the Data Warehouse – Part II

Tags

Masterclass

Masterclass Mode Engaged

Martyn Rhisiart Jones

Madrid, Sunday 1st February 2026

PART I

PART II

Preparation

Preparing for a new DW iteration (expanding an enterprise-wide system) or a data mart (a focused subset for a department such as sales, marketing, or finance) involves aligning business needs with technical execution. This ensures scalability, data quality, and ROI in modern 2026 environments (e.g., cloud lakehouses like Snowflake/Oracle/PostgreSQL, agile/ETL/ELT tools like dbt). The process is iterative, drawing on methodologies such as Kimball (bottom-up, dimensional) and Inmon (top-down, normalised), or hybrids. Below is a comprehensive breakdown covering business, design, requirements, process, and artefacts.

1. Business Preparation

Focus on strategic alignment and value justification to secure buy-in and resources.

Define Business Objectives and Case: Identify goals (e.g., improved forecasting, greater rigour in compliance, AI enablement-whatever that is). A greater ability to quantify benefits like cost savings or revenue impact, and conduct ROI analysis.
Stakeholder Identification and Engagement: Map sponsors, SMEs (e.g., finance leads), end-users. Form cross-functional teams (between business and IT).
Assess Current State: Audit existing data systems for gaps, quality issues, dysfunctional IT, and technical debt (e.g., brittle pipelines and shoddy implementations).
Prioritise Scope: Decide on the iteration focus (e.g., a single subject area, such as customer analytics) to deliver quick wins. NB Things that look like low-hanging fruit and quick-wins are frequently nothing of the sort.

2. Requirements Gathering

Elicit detailed needs to ensure the DW/data mart answers key business questions.

Gather Business and Functional Requirements: Use workshops, interviews, JAD sessions to define KPIs, use cases, analytical questions (e.g., “Daily sales by region?”). Create BRD or user stories.
Non-Functional Requirements: Specify performance (e.g., query speed), security (row-level access), scalability, data freshness.
Data Needs Analysis: Profile sources for volume, quality, availability. Identify gaps (e.g., missing historical data).
Prioritisation: Rank requirements by value (e.g., MoSCoW method: Must-have, Should-have).

3. Design Phase

Architect the structure for efficient data flow and querying.

Choose Architecture: Decide on type (centralised DW, hybrid lakehouse rubbish, bottom-up data marts-first vs. top-down DW-first).
Data Modelling: Create conceptual (high-level entities), logical (relationships), and physical models (DDL). Use star/snowflake schemas for marts; consider Data Vault for agility.
ETL/ELT Design: Plan extraction, transformations (modular layers: staging, integration, marts), loading. Include governance (quality rules, lineage).
Tool Selection: Choose stack (e.g., dbt for transformations, Great Expectations for quality).

4. Process/Steps

Follow an iterative lifecycle (agile/Scrum adapted for DW) for preparation and execution.

Step	Description	Key Activities
1. Initiation	Kickoff and planning.	Create project charter, roadmap, budget/timeline. Assign roles (e.g., BA, architect).
2. Analysis	Deep dive into data/sources.	Source profiling, gap analysis, domain modeling.
3. Design & Prototyping	Build blueprints.	Models, mappings, proof-of-concept (e.g., sample ETL).
4. Validation	Review and refine.	Stakeholder walkthroughs, feasibility checks.
5. Setup for Implementation	Prep for build.	Environment setup, automation (CI/CD), testing strategy.
6. Governance & Maintenance Planning	Ensure sustainability.	Define monitoring, change management, incident response.

5. Artefacts/Deliverables

Living documents and models to guide and document the project.

Business/Requirements: Project charter, BRD/user stories, KPI definitions, stakeholder register, requirements traceability matrix.
Design: Architecture diagrams, data models (ER, dimensional bus matrix), source-to-target mappings, ETL specs.
Process/Support: Risk register, test plans, data dictionary/glossary, profiling reports, change logs.
Governance: Quality rules, lineage diagrams, security models, operations guide.

In agile setups, keep artefacts lightweight and versioned (e.g., in Jira/Confluence). Start with high-priority items to enable early iterations. If your org follows a specific method (e.g., Data Vault), adapt accordingly for best results.

Execution

Here are all the key execution steps for implementing a new data warehouse (DW) iteration or a new data mart in a modern (2026) context. This covers the full lifecycle from preparation through to ongoing operations, drawing on established approaches such as Kimball (bottom-up, iterative dimensional marts), Inmon (top-down enterprise), hybrids, and agile/ETL/ELT practices (dbt, Snowflake, Oracle, etc.).

Execution is typically iterative: deliver one subject area or increment at a time for quick value, then expand.

1. Project Initiation & Planning

Define business case, objectives, and success metrics (e.g., KPIs improved, ROI targets).
Secure sponsorship, budget, timeline, and resources.
Assemble cross-functional team (business SMEs, analysts, architects, engineers, governance).
Create project charter, high-level roadmap, and risk register.
Select methodology (agile sprints, Kimball lifecycle increments, or hybrid).

2. Business Requirements Definition

Conduct workshops, interviews, and JAD sessions with stakeholders.
Capture analytical questions, KPIs/measures, dimensions, hierarchies, use cases.
Document business requirements (BRD or prioritised user stories/backlog).
Define non-functional needs (performance, security, data freshness SLAs).
Prioritise scope for the first iteration (e.g., one business process or department).

3. Source System Analysis & Discovery

Inventory all relevant sources (databases, APIs, files, streams).
Profile data: volume, quality (nulls, duplicates, cardinality), relationships, freshness.
Identify gaps, data quality issues, access constraints.
Map high-level data flows and lineage.

4. Architecture & High-Level Design

Choose overall architecture (e.g., lakehouse rubbish, centralised DW and data marts, Data Vault core).
Define layers: raw/staging, integration, and presentation/data marts.
Catalogue all business terms and data.
Design conceptual and logical models (entities, structures, and relationships).
Plan tech stack (cloud platform, ETL/ELT tools like Informatica, dbt/Airflow, semantic layer).
Prove architecture with a quick proof-of-concept (PoC) framework (end-to-end load and query).

5. Detailed Data Modeling & Design

Define grain/granularity for facts.
Design dimensional model (Kimball: star/snowflake schemas, conformed dimensions bus matrix) or normalised/hybrid.
Create physical model (tables, indexes, partitions, views).
Develop source-to-target mappings (field-level, transformations, business rules).
Specify ETL/ELT patterns (incremental, full, CDC, error handling).

6. Data Integration / ETL / ELT Development

Build pipelines: extract from sources, stage/transform (e.g., dbt models), load to targets.
Implement data quality rules, cleansing, enrichment.
Handle slowly changing dimensions, hierarchies, surrogate keys.
Set up orchestration (scheduling, monitoring, alerts).

7. Testing & Validation

Unit test transformations and models.
Integration/reconciliation testing (source vs. target counts, sums, samples).
Data quality validation (rules, profiling).
Performance/load testing.
User acceptance testing (UAT) with business stakeholders on sample data/reports.

8. Deployment & Go-Live

Deploy to production environment (migrate schemas, pipelines, data).
Execute initial full load (historical data) + switch to incremental.
Roll out access (security roles, row-level security, semantic models).
Cutover: switch users to new system, monitor closely.

9. User Enablement & Adoption

Train end-users (analysts, execs) on tools (e.g., Power BI, Tableau).
Deliver initial reports/dashboards or self-service setup.
Provide user guides, FAQs, support channels.
Gather early feedback for quick fixes.

10. Operations, Monitoring & Iteration

Set up monitoring (load success, freshness, anomalies, costs).
Establish governance (change management, glossary updates, lineage tracking).
Run post-implementation review (lessons learned).
Plan next iterations: add sources, refine models, expand marts.
Continuous improvement: optimise queries, automate more tests, incorporate AI features.

Quick Summary Table (Typical Sequence)

Phase	Focus	Key Output
1–2	Business alignment	Charter, prioritised backlog
3–4	Discovery & architecture	Profiling reports, high-level design, PoC
5	Modelling	Dimensional/physical models, mappings
6	Build pipelines	ETL/ELT code, quality rules
7	Test rigorously	Test cases, reconciliation results
8–9	Deploy & enable	Live system, trained users
10	Operate & iterate	Monitoring dashboards, next backlog

In agile setups, repeat steps 5–10 per sprint/increment (2–4 weeks) for one subject area, delivering incremental value fast while building toward enterprise scale. Start small, prove value, then expand, to avoid classic “big bang” failures.

Outcomes and Artefacts

In the context of implementing a new DW iteration (expanding an enterprise system) or data mart (department-focused subset), success hinges on making informed decisions, producing key artefacts, and achieving targeted outcomes. These elements ensure alignment, traceability, and value delivery in modern agile/ELT environments (e.g., Snowflake, dbt). Below is a comprehensive breakdown, drawing from best practices like Kimball, Inmon, and Data Vault methodologies.

1. Desired Decisions

These are critical choices made throughout the project to guide direction, mitigate risks, and optimise resources. They should be collaborative, data-driven, and documented for accountability.

Strategic Decisions:
- Business objectives and scope: Decide on primary goals (e.g., enable real-time analytics, reduce reporting time by 50%). Prioritise subject areas (e.g., sales vs. finance first) for iterative delivery.
- Methodology selection: Choose agile (sprints for quick marts) vs. waterfall (for regulated environments), or hybrid (Kimball for data marts + Inmon for enterprise core).
- Architecture approach: Opt for centralised DW, distributed marts, a collection of lakehouse junk (raw and governed layers), or Data Vault for auditability/flexibility.
- Tool stack: Select platforms (e.g., Snowflake/Oracle/PostgreSQL for storage/compute), ELT tools (dbt for transformations), quality tools (Great Expectations), and BI (Power BI/Tableau).
Requirements and Design Decisions:
- Granularity/grain: Determine data detail level (e.g., transaction-level for facts to allow aggregation).
- Conformed dimensions: Decide which dimensions (e.g., customer, date) are shared across marts for consistency.
- Non-functional trade-offs: Balance performance (e.g., sub-second queries) vs. cost (pay-per-query cloud), data freshness (real-time vs. daily), and security (row-level vs. column-level).
- Governance rules: Set data quality thresholds (e.g., 95% completeness), ownership, and compliance (e.g., GDPR masking or anonymisation).
Execution and Operational Decisions:
- Prioritisation of backlog: Rank user stories by value/ROI and strategic importance (e.g., using MoSCoW: Must/Should/Could/Won’t).
- Incremental vs. big-bang rollout: Decide on phased releases (e.g., one mart per quarter).
- Risk mitigations: Approve contingency plans (e.g., fallback for source failures).
- Go-live criteria: Define success gates (e.g., 99% reconciliation accuracy).
Post-Implementation Decisions:
- Scaling: Plan expansions (e.g., add AI/ML features, if you really must).
- Optimisation: Decide on ongoing tweaks (e.g., partition pruning for queries).

All decisions should involve stakeholders (sponsor, product owner, architect) and be logged in a decision register for traceability.

2. Artefacts

These are tangible deliverables, documents, models, code, and tools, that capture knowledge, enable execution, and support maintenance. Keep them living (versioned in Git/Jira/Confluence) and lightweight in agile setups.

Business and Requirements Artefacts:
- Project charter: Outlines objectives, scope, timeline, budget.
- Business requirements document (BRD) or user stories/backlog: Detailed KPIs, use cases, analytical questions.
- Requirements traceability matrix: Links needs to sources/models/tests.
- Stakeholder register and communication plan: Lists roles, expectations.
Design and Architecture Artefacts:
- Architecture diagrams: High-level layers (staging to integration to data marts).
- Data models: Conceptual (ER diagrams), logical (relationships), physical (DDL scripts, indexes).
- Dimensional bus matrix: Conformed dimensions/facts (Kimball-specific).
- Source-to-target mappings: Field-level transformations, rules.
- ETL/ELT specifications: Patterns (incremental loads, CDC), error handling.
Development and Testing Artefacts:
- Code repositories: Pipelines (dbt models/SQL), orchestration scripts (Airflow DAGs).
- Data profiling reports: Source quality insights (nulls, volumes).
- Test plans/cases: Unit, integration, UAT, reconciliation scripts.
- Data dictionary/glossary: Field definitions, owners.
Governance and Operations Artefacts:
- Risk/issue register: Identified risks, mitigations.
- Data lineage diagrams: End-to-end flows.
- Quality rules/tests: Automated checks (e.g., Great Expectations YAML).
- Deployment runbook: Migration steps, rollback.
- Operations guide: Monitoring dashboards, support processes.
- Change logs: Version history for models/mappings.
User-Facing Artefacts:
- Report/dashboard mockups or prototypes.
- Training materials/user guides: For self-service BI.

3. Outcomes

These are the measurable results and benefits realised from the project, validating its success and driving adoption. Track via KPIs defined early.

Business Outcomes:
- Improved decision-making: Faster, accurate insights (e.g., reduced reporting time from days to minutes).
- ROI realisation: Cost savings (e.g., 20% lower ETL maintenance), revenue growth (e.g., better targeting via customer mart).
- Enhanced analytics capabilities: Self-service dashboards, AI readiness (e.g., cleaner data for ML models).
- Compliance and risk reduction: Auditable data (e.g., full lineage for regulations and compliance).
Technical Outcomes:
- Reliable data platform: High uptime (99.9%), scalable storage/compute.
- Data quality improvements: Metrics like 98% accuracy, reduced errors.
- Efficient operations: Automated pipelines, monitoring alerts for anomalies.
- Extensibility: Flexible architecture for future iterations (e.g., easy addition of new sources).
Organisational Outcomes:
- Increased data trust and adoption: Users rely on the DW/mart for daily decisions.
- Knowledge transfer: Team upskilled in tools/methods.
- Lessons learned: Post-review report for future projects.
- Cultural shift: Towards data-driven practices, with governance embedded.

In summary, decisions provide direction, artefacts ensure execution and traceability, and outcomes confirm value. For a successful DW/data mart iteration, aim for iterative delivery: Start with a minimal viable data mart, measure outcomes early, and refine. If focusing on a specific methodology or tool, outcomes can be tailored (e.g., faster time-to-insight in Kimball vs. stronger governance in Inmon).

Closing Thoughts

The spirit of a successful data warehouse (DW) iteration team, or one building/refining data marts, is what separates teams that deliver trusted, evolving platforms from those that produce expensive, under-used warehouses. In requirements gathering, design, modelling, and delivery for DW/BI projects (whether Kimball-style dimensional marts, hybrid lakehouses, or agile ETL/ELT increments), this spirit manifests as a shared mindset and behaviours that drive outcomes far beyond technical correctness.

Here’s what that spirit looks like in practice, tied directly to requirements and DW/data mart work:

1. Ruthless Focus on Business Value First

The team lives by: “We exist to answer real business questions faster and more reliably than before.”
In requirements sessions, they obsess over why a KPI matters, not just what fields are needed. They challenge vague asks (“better visibility”) until they become crisp analytical questions (“daily cohort retention by acquisition channel”).
They prioritise increments that deliver measurable wins early (one high-impact mart over a “perfect” enterprise model), embracing Kimball’s bottom-up philosophy: build usable marts quickly, then connect them via conformed dimensions.
Spirit killer: Tech-first thinking (“let’s model everything perfectly upfront”). It’s BS on stilts, and it’s embraced almost everywhere where data warehousing is being done wrong.

2. Collaborative Curiosity Over Silos

Business SMEs, analysts, architects, and engineers sit together (virtually or physically) as equals.
Requirements workshops feel like joint problem-solving, not hand-offs. The BA asks probing questions; the data engineer flags source realities early; the business stakeholder validates grain assumptions on the spot.
They treat “I don’t know” as progress, then profile data live to discover truths (null rates, cardinality surprises, and the results of shoddy application development and cruddy data models).
Spirit killer: “Throw it over the wall” mentality between business and IT.

3. Iterative Humility and Learning Velocity

They embrace short cycles (2–4-week sprints) in which each delivers a working slice: mapped sources, transformed data, testable data mart, validated dashboard, and report.
After every iteration, we ask: “What did we learn? What broke trust? How do we increase the cadence next time if possible?”
They accept that requirements evolve, sources change, and business priorities shift, and build flexibility (modular dbt models, living mappings) instead of rigid upfront perfection.
Spirit killer: Big-bang planning or fear of “incomplete” deliverables.

4. Fanatical Data Trust & Quality Obsession

The team’s north star (or southern star) is: “If the business can’t trust the numbers, nothing else matters.”
They bake quality into requirements (thresholds, reconciliation points) and enforce it via automated tests from day one.
They celebrate when stakeholders say, “Finally, the numbers match what I see in the source system.”
Spirit killer dilemma: “It’s good enough” compromises on quality or lineage, but sometimes good enough is good enough.

5. Ownership and Accountability Without Blame

Every team member owns end-to-end outcomes, not just “my part.”
When a load failsor a KPI or OKR is incorrect, the response is “How do we fix it and prevent recurrence?” rather than finger-pointing.
They maintain living artefacts (source-to-target mappings, bus matrix, glossary) as team assets, not individual homework.
Spirit killer: “That’s not my job” or unowned technical debt.

6. Pragmatic Balance of Rigour and Speed

They apply just-enough discipline: detailed mappings for complex transformations, lightweight user stories for simple ones.
In 2026’s cloud/ETL/ELT world, they favour tools that let them iterate fast (dbt tests, Snowflake cloning for experiments) while keeping governance (lineage, quality gates) non-negotiable.
They know when to prototype vs. over-engineer, e.g., quick PoC for grain debates before full modeling.

7. Shared Pride in Business Impact

Success is measured by adoption and outcomes (e.g., “Marketing doubled campaign ROI thanks to the new customer mart”), not lines of code or completed sprints.
They evangelise wins internally… “Look what we enabled!”… building momentum for the next iteration.
Spirit killer: Viewing the project as “IT delivery” instead of business enablement.

In short, the spirit of a truly successful DW/mart iteration team is business-obsessed, collaborative, humbly iterative, trust-obsessed, and impact-proud. They treat requirements not as a phase to complete, but as a living conversation that sharpens with every cycle. They build marts that solve today’s problems while staying extensible for tomorrow’s, and they do it with a quiet confidence that comes from repeatedly delivering value people actually use.

That’s the mindset that turns a data warehouse project from “another IT initiative” into a strategic asset the business fights to expand.

Summary

Requirements Focus

Business & Functional Requirements: Captured via workshops, interviews, JAD sessions. Define:
- Key business questions / analytical use cases
- KPIs, measures, dimensions, hierarchies
- Prioritised user stories or BRD
Non-Functional Requirements: Performance (query speed), security (row-level, masking), scalability, data freshness SLAs, availability.
Data Needs & Source Analysis: Profile sources for volume, quality (nulls, duplicates, cardinality), gaps, historical coverage, and access constraints.
Prioritisation: Use MoSCoW (Must/Should/Could/Won’t) or value/ROI ranking; focus on high-impact scope for first iteration (one subject area/business process).
Traceability: Requirements traceability matrix linking needs → sources → models → tests.

DW / Data Mart Overall Process (Preparation + Execution)Iterative, agile-adapted lifecycle (Kimball bottom-up dimensional marts preferred for quick value; hybrids/Data Vault/Inmon where needed).Core Phases & Activities

Business Alignment: Define objectives, ROI case, success metrics; secure sponsorship; stakeholder mapping; current-state audit (gaps, debt).
Requirements & Scope: Gather crisp analytical questions + KPIs; prioritise one focused increment (avoid big-bang).
Source Discovery: Inventory, profile, map lineage, identify quality issues.
Architecture & Modelling: Choose layers (raw/staging through integration through to presentation/marts); dimensional design (star/snowflake, conformed dimensions, bus matrix, grain decisions); physical DDL; source-to-target mappings.
ETL/ELT Build: Modular pipelines (dbt-style), incremental/CDC loads, SCD handling, quality rules, orchestration, error handling.
Testing: Unit, integration, reconciliation, data quality, performance, UAT.
Deployment & Enablement: Full and incremental load, go-live cutover, security setup, user training, initial reports/dashboards.
Operations & Iteration: Monitoring (freshness, anomalies, costs), governance (lineage, glossary, change mgmt), lessons learned, plan next increments.

Key Artefacts (Requirements & DW-Related)

Project charter, stakeholder register
BRD / prioritised backlog / user stories
Requirements traceability matrix
Data profiling & gap reports
Architecture diagrams, dimensional bus matrix
Conceptual/logical/physical models
Source-to-target mappings, ETL specs
Data dictionary/glossary
Quality rules/tests, lineage diagrams
Test plans, risk/issue register

Desired Outcomes & Mindset

Outcomes: Faster/more reliable decisions, measurable ROI, high data trust/quality, self-service adoption, compliance, extensible platform.
Team Spirit: Ruthless business-value focus, collaborative curiosity, iterative humility, fanatical quality obsession, end-to-end ownership, pragmatic rigour, pride in real business impact.

One-line essence: Requirements must translate real business questions into prioritised, high-quality, iteratively delivered data marts/DW increments that build trust and deliver measurable value fast, never tech-first or big-bang, let the conceited dopes monopolise that nonsense.

GOOD STRATEGY

~ DATA, INFORMATION & KNOWLEDGE

Requirements – Building the Data Warehouse – Part II

Masterclass Mode Engaged

Preparation

Execution

Outcomes and Artefacts

Closing Thoughts

Summary

Suggested Reading

Discover more from GOOD STRATEGY

Please leave a reply Cancel reply

Masterclass Mode Engaged

Preparation

Execution

Outcomes and Artefacts

Closing Thoughts

Summary

Suggested Reading

Discover more from GOOD STRATEGY

Share this:

Related

Please leave a reply Cancel reply