Tags

, , , , , , , ,


Martyn Rhisiart Jones

What to say?

I was reading an article. It was written by Jeff Wilts and recommended by Bill Inmon. I got to this statement: “Teradata is a full-featured enterprise data warehouse.”  For me, it went further downhill from there.

It was very disheartening and deceptive. I decided to write an article about my thoughts on it. (Understanding the Data Warehouse Dilemma – 2026/02/07, https://goodstrat.com/2026/02/06/understanding-the-data-warehouse-dilemma-2026-02-07/).

As a result, many people approached me. They asked directly and indirectly if I would suggest ways and means to overcome or avoid those dilemmas.

This is the result.

Enjoy! But even better, let me know what you think.

My original text was prompted by what I saw as bizarre claims. It critiques common pitfalls in data warehousing. The text emphasises that many issues arise from prioritising technology, architecture, and governance over business alignment. It highlights the need for semantic clarity and practical usability. It argues that data warehousing should be seen as a “business sense-making apparatus” rather than just an IT system. Below, I’ll outline the main problems and provide actionable steps to avoid, fix, or mitigate them. These recommendations focus on shifting from tech-centric approaches to collaborative, business-oriented practices.

1. Technology Doesn’t Build Trust in Data

  • Problem: Business users mistrust data due to inconsistencies (e.g., mismatched revenue figures or shifting definitions like “active” users), not because of on-premise vs. cloud setups. Migrating to tools like Snowflake or Teradata won’t fix semantic disagreements.

What to Do:

  • Avoid: Don’t assume new tech resolves trust issues, evaluate migrations based on business outcomes, not hype. Involve business stakeholders early in tool selection to identify root causes of mistrust (e.g., via workshops to map data inconsistencies).
  • Avoid like the plague. Don’t assume new tech is genuinely new. Do not believe that claims have anything to do with reality. Beware of the charlatans, grifters and snake oil merchants. And remember this “Big data was mainly bullshit.”
  • Fix: Conduct semantic audits: Gather cross-functional teams (e.g., finance, marketing, operations) to define and document key terms in a shared glossary. Use tools like data catalogues (e.g., Collibra or Alation) to socialise and enforce these definitions across systems and business boundaries.
  • Mitigate: Implement data quality checks tied to business metrics, such as automated reconciliation reports that flag discrepancies between sources. Monitor adoption by tracking how often business users query the warehouse without follow-up questions.
  • Reject: Dismiss the notion that perfect data quality in all data is a desirable goal. It’s not. Good enough data quality is good enough. Don’t forget, this is a business call.

2. Problem: Overly Normalised or Exhaustive Models That Go Unused

Problem: Data models are often perfect (Ed. In reality they are more often far from perfect) on paper but irrelevant to business users, acting like an unused “legal code in a dead language.”

What to Do:

  • Avoid: Do not build comprehensive data and process models upfront. This approach wastes time, effort, and patience. Start with minimal viable models focused on high-priority business questions, iterating based on continual business engagement and feedback. Stick to business-driven MVPs in everything you do. If you can’t do that, find another job where you won’t do so much damage.
  • Link what you’re doing to the conceptual data models and data catalogue for the business areas.
  • Fix: With the full cooperation and clout of business stakeholders translate models into business-friendly formats, such as visual diagrams or simplified views (e.g., using denormalised datasets or semantic layers in tools like dbt, Sparks, Collibra or Looker). Train non-data business professionals on how to use them through multiple hands-on sessions.
  • Mitigate: Measure model utility by metrics like query frequency or user engagement. More importantly, assess utility by the degree of need and want expressed by business stakeholders. Ensure alignment with those goals.  Identify and deprecate unused parts of the models. Redirect resources to accessible interfaces, like self-service BI, end-user computing, visualization, and analytics tools.
  • Don’t model anything that is outside the scope of a reasonable iteration. And I mean anything. Focus on the specific needs of the business. NB A reasonable iteration is one that can be delivered in less than three months. It aligns with what the business has asked for. It isn’t overburdened with superfluous and unjustifiable data detritus and bureaucratic blarney.
  • Don’t overload an iteration of a data warehouse by including anything that isn’t needed.

3. Problem: Excessive Governance Leading to Unused Data

Problem: It is said that perfect governance creates an “expensive museum” where data is locked away, ignored by users. But this is more IT bullshit. There is no perfect governance, anywhere. However…

What to Do:

  • Avoid: Balance governance with agility, adopt “just enough” policies that prioritise speed to value over perfection. Set governance thresholds based on data sensitivity (e.g., lighter rules for internal analytics).
  • Fix: Shift to progressive governance: Release datasets in stages (e.g., beta versions with basic checks), gathering user feedback to refine rules. Involve business owners in governance committees to ensure policies align with real needs.
  • Mitigate: Track data utilisation rates (e.g., via warehouse logs) and tie governance efforts to ROI. If a dataset remains unused after governance, archive it and focus on high-impact areas.

4. Problem: Declaring a “Single Source of Truth” Without Negotiating Meanings

Problem: Proclaiming a single source without resolving underlying disagreements leads to prolonged debates, despite fast-loading dashboards.

What to Do:

  • Avoid: Skip unilateral declarations. Instead, frame the warehouse as a “negotiated source of truths and suppositions” from the start.
  • Fix: Facilitate cross-departmental negotiations: Host “data alignment” meetings to agree on metrics and sources, documenting outcomes in a central repository. Use data lineage tools to trace and visualise how data flows and transforms.
  • Mitigate: Embed dispute resolution processes, like escalation paths for inconsistencies. Reward teams for reducing meeting time spent on data debates, perhaps through KPIs linked to decision speed.

When I hear IT folks talk about the need for a single source of truth, I instinctively know they really know jack about what they pretend to know. And they will invariably wreak more damage than cautious rank amateurs.

5. Problem: Focusing on Data Volume Over Actionable Insights

Problem: Ingesting massive volumes (e.g., 40TB/day) is prioritised over reliably answering a few critical questions, leading to underutilised storage.

What to Do:

  • Avoid: Don’t chase scale for its own sake. Define success by the number of key business questions answered accurately, not by ingestion metrics.
  • Fix: Prioritise data pipelines for high-value sources. Conduct a rigorous “question inventory” with business leaders to identify the top 10-20 queries. Then, optimise ingestion and partitioning around them. Use cost-optimised storage (e.g., cold vs. hot tiers) to avoid bloating.
  • Mitigate: Implement data pruning routines to archive or delete low-value data. Shift status reporting to outcome-based metrics, like “questions resolved per TB stored.”

6. Problem: Treating Data Warehousing as an IT System Instead of a Business Sense-Making Tool and Business Process

Problem: This is an overall category error, emphasising tech purchases and purity over negotiated understanding, rewarding “cathedrals” over practical “chapels.”

What to Do:

  • Avoid: Reframe job descriptions and incentives: Hire or train data professionals with business acumen (e.g., via certifications in business analysis). Align projects to business KPIs from inception.
  • Fix: Foster hybrid roles or teams: Pair data engineers with business analysts to co-design solutions. Use agile methodologies to deliver incremental value, with a focus on commercial impact (e.g., revenue uplift from better decisions).
  • Mitigate: Change reward structures: Tie bonuses or promotions to business outcomes (e.g., user satisfaction scores or reduced data mistrust incidents) rather than architectural milestones. Regularly survey business users on the warehouse’s usefulness.
  • Don’t (never, ever, ever) let IT decide how a data warehouse iteration is to be run. If the business doesn’t have the skills, they can always hire a trusted and experienced external adviser. This adviser would be knowledgeable, like Martyn Rhisiart Jones.

General Strategies to Shift the Culture

  • Build Collaboration: Embed data professionals in business units for short to not-so-short rotations to understand real-world business needs. Don’t embed personalities that will frighten the horses, create destructive abrasion and alienate the business.
  • Measure What Matters: Use dashboards tracking business adoption, trust levels (e.g., via NPS surveys), and ROI, not just tech metrics. NB Measure what makes sense to measure.
  • Educate and Communicate: Share stories of successful “chapel” projects. These are small, focused wins. They counter the allure of grand cathedral architectures, which are all Gothic, frivolous and full of sins.
  • Vendor Selection: As hinted previously, prioritise reliable tools based on proven track records for stability. Ensure they align with your business needs. Do not focus solely on features. Research alternatives thoroughly and constantly before committing. Tools like Databricks, cat, cut, awk, and grep don’t work for me. However, they might work for you.
  • Expertise, Knowledge and Data Warehouse Smarts: Hire a trusted adviser.

Bottom line? By implementing these steps, you might move toward a super-effective data warehousing practice. This practice delivers timely, adequate, and appropriate business value. It also reduces the persistent question of “which number do I trust?”

PS When I am looking for reliable technology Databricks is not on my list.

The data warehouse, that once-grand promise of unified corporate intelligence, has too often become an expensive monument to misplaced priorities. Many organisations have rushed to embrace cloud platforms. They have implemented intricate governance frameworks. In doing so, they have established voluminous ingestion pipelines. Unfortunately, they have overlooked a fundamental truth. The warehouse is not merely an IT artefact. It is a mechanism for business sense-making.

A recurring critique highlights that trust in data erodes due to semantic drift. This issue arises from mismatched definitions of “active user” or “revenue” across departments. No migration to Snowflake or similar can resolve this issue. Technology vendors’ bold claims frequently prove illusory. One observer notes that much of the “big data” era was, in retrospect, largely hype. The antidote lies in semantic audits and shared glossaries. These are forged in cross-functional workshops rather than imposed top-down. Data catalogues enforce clarity. Automated reconciliations flag discrepancies tied to real business metrics. Perfection in data quality remains elusive and unnecessary; “good enough” for decision-making suffices, provided it is a business judgement.

Over-engineered models, normalised to theoretical perfection yet seldom queried, resemble legal codes in a forgotten tongue. The remedy is ruthless pragmatism. Begin with minimal viable datasets addressing high-priority questions. Iterate swiftly, ideally within three months. Deprecate the unused. Business stakeholders must drive scope. They must translate models into accessible, denormalised views using tools such as dbt or Looker. This should be supported by hands-on training.

Excessive governance risks creating locked vaults of underused data, though true “perfect” governance is mythical. A balanced, progressive approach is more effective. It involves releasing beta datasets with light controls. Refining them via feedback and involving business owners enhances agility. Track utilisation and ROI rigorously; archive the dormant.

Proclaiming a “single source of truth” without prior negotiation invites endless debate. It is far wiser to position the warehouse as a negotiated repository of truths and assumptions. This can be facilitated through alignment sessions and lineage visualisation. Embedded dispute mechanisms can then reward faster, less contentious decisions. Prioritising sheer volume, terabytes ingested daily, over actionable answers bloats costs without value.

Success should be measured by critical questions reliably answered, not scale alone. Focus pipelines on high-value sources via question inventories, prune relentlessly and shift metrics to outcomes per unit stored. At root lies a category error: treating warehousing as an IT project rather than a collaborative business process. Incentives must align with commercial impact – revenue uplift, reduced mistrust – not architectural purity. Hybrid teams, agile delivery and rotations embed data professionals in business units. This fosters mutual understanding and rewards practical “chapels” over grandiose “cathedrals”.

The path forward demands a cultural shift. Measure adoption and trust through surveys and NPS**-style feedback. Communicate focused wins. Select tools for proven utility rather than features. Where internal expertise falters, engage trusted external advisers. Done well, the warehouse ceases to prompt the weary question “which number do I trust?” and instead becomes the reliable apparatus through which businesses make coherent sense of their world.


**Net Promoter Score


Discover more from GOOD STRATEGY

Subscribe to get the latest posts sent to your email.