Tags

, , , ,


Spain, 2nd November 2025

And So It Begins

Five years ago, I wrote a piece called Bullshit at the Data Lakehouse, which caused quite a bit of a stir on social media, so I’m taking this opportunity to revisit it and revise it and to offer it as a historical reference with ramifications for today and the future.

My piece began in this spirit and then progressed rapidly.

Move over, big data hubris, data mesh revisionism, and data lake stupidity, there’s a newer, thicker and far bigger arsehole on the block. And it goes by the unbelievably idiotic name of data lakehouse. It is being hailed as a new paradigm, but in reality, it is a naive, dishonest, and disruptive fraud. So what’s occurring?

The gutter-snipes, hustlers and useless pundits who failed to make big data and data lakes the success of the 2010s have set their vulture-eyed sights on data warehousing. It’s not smart, it is not funny, and it does no one any service.

Paging Mister Sam Hill!

So what in the name of Sam Hill drives these snake-oil peddlers to peddle such crass, irresponsible, and downright reckless nonsense? And why, of all sacred cows, do they keep swinging the wrecking ball at data warehousing?

The Grift Equation Warehouses are the cash cow. Snowflake’s $2.5 B ARR didn’t materialise from altruism. Any “new paradigm” needs a fat, visible target to siphon budget from.

Incumbents are slow. Redshift, BigQuery, and Synapse move like glaciers; evangelists can outrun them on hype alone.

The word “warehouse” sounds 1990s. Lakehouse? Chef’s kiss, vibes of modernity, even if the tech is a 2015 Spark patch.

The Playbook FUD and the data warehouse: “Rigid! Expensive! Can’t handle unstructured data!” (Ignoring that 90 % of queries are still structured.)

Promise the moon: “One ring to rule them all!” (Ignore the 47-page Databricks whitepaper on why you still need a warehouse for BI.)

Close the deal: “Sign here and we’ll throw in a free POC that mysteriously requires our 800€/hr architects.”

The Real Target

They’re not attacking warehousing because it’s broken—they’re attacking it because it’s working. A stable, governed, performant data warehouse is the ultimate buzzkill for anyone trying to sell architectural churn. The lakehouse isn’t a replacement; it’s a Trojan horse for vendor lock-in, dressed in open-format cosplay. So next time a deck claims “the warehouse is dead,” follow the money. The only thing dying is your runway—while their ARR hits escape velocity.

Let Me Say That Again

Why? Because it is their unique differentiator. And snake-oil merchants are, well, snake-oil merchants. And because these pundits have placed so much faith in big data technology that they simply can’t let go. Big data technology is their comfort blanket. So, they are using a new angle to try to flog technology that very few people need. To solve the challenges that they don’t have. And using data that has no intrinsic value. Clearly, they understand the technologies even less than they know the issues and opportunities that need to be addressed. And they certainly don’t understand the data.

In any case, these individuals are likely to fail again, as they have failed miserably in the past. Because physics is physics, facts are facts, and sow’s ears will never be silk purses. No matter what the bullshit magic-quadrants, the virtuous-circles or the hype-cycles say.

Say what?

The Origin Story Nobody Asked For

Picture the scene: a parade of venture-backed evangelists, slide-deck messiahs, and LinkedIn influencers descend on yet another conference ballroom. They promise the impossible—a single system that delivers the scale and cost of a data lake plus the ACID guarantees, schema enforcement, and SQL friendliness of a warehouse. No trade-offs, no compromises, just pure architectural enlightenment. The crowd nods along, hypnotised by the portmanteau. Lake + house = innovation, apparently.

The Technical Sleight of Hand

Under the hood, a “lakehouse” is typically just a data lake with a table format (such as Delta Lake, Iceberg, or Hudi) applied on top, along with a query engine (like Spark, Trino, or Dremio) that pretends it’s a warehouse. You get:

  • Open file formats (Parquet, ORC) for storage
  • Metadata layers for transactions
  • Time-travel queries, because why not
  • Governance theatre via Unity Catalogue or similar

Sounds reasonable, until you realise you’re still managing two runtimes, duplicate tooling, and fragile optimism that file-level ACID will survive production workloads. Congratulations: you’ve built a Frankenstein that lurches between lake sloppiness and warehouse rigour, satisfying neither.

The Economic Grift

The pitch is seductive: “One platform to rule them all.” Translation: lock yourself into a cloud vendor’s managed service (Databricks, Snowflake-on-lake, etc.) and watch your egress bills balloon. The lakehouse doesn’t eliminate complexity; it hides it behind abstraction while quietly multiplying your SKU count. Your data engineers now speak fluent “compaction scheduling” and “Z-ordering mysticism” instead of, you know, shipping features.

The Cultural Damage

Worst of all, the lakehouse narrative erases decades of hard-won lessons. We already solved the separation of storage and compute (S3 + Athena). We already learned that schema-on-read is a trap for anything beyond throwaway analytics. We already watched data swamps devour budgets. But, sure, let’s rebrand the same mistakes with a cuter name and a $400 million Series F funding round.

The Reality Check

If you need a data warehouse, purchase one. If you need a data lake, build one. If you need both, orchestrate them deliberately with clear contracts—don’t cosplay unity with a buzzword. The lakehouse isn’t a paradigm shift; it’s a marketing tourniquet applied to bleeding architecture.

So the next time a solutions architect whispers “lakehouse” in your ear, ask them to open the hood. You’ll find the same rusty engine, now with extra sparkles and a higher price tag. The only thing being housed here is your common sense—evicted without notice.

Onwards and Upwards

However, “Yes, they say. A data warehouse is all very well for structured data, but what about all of that unstructured data and semi-structured data that companies have?” Dudes! We have Textual ETL for that! And guess what, in decision-making and relevant data terms? Your most valuable data is still in your operational systems. If that isn’t highly structured, then heaven knows what your operational systems look like. However, seriously, the big companies analysing unstructured data as part of their business model are very few and far between. And most others don’t need it. And you can take that to the bank.

Then again, some folk fret about data, technology and brands. As if data warehousing can’t hack data related to brands? And you’d need Spark, data streaming from social media and the Hadoop ecosphere to make that magic sauce work. But here’s the rub. All this talk about the importance of brands, online interactive advertising and understanding sentiment is Bullshit. Or as Bob Hoffman put it, “You’re passionate about BRANDS? Dude, get a f***ing girlfriend.” So, data lakehousers, you’re poor losers on that point too.

Seriously, folk. I wish I could say something positive about data lakehouses and their proponents. Still, I won’t, as I’m unable to do so.

The lakehousers’ genuinely imprudent ideas about data architecture, engineering, and management have been resurrected from the remnants of Yellow Elephant and its dodgy ecosphere. And again, the big data twits are targeting data warehousing to take it down and replace it with their poor, absurd and ultimately unimplementable vision of a data dystopia. But still, one more time, in their ignorance, arrogance and lack of depth, they are wrong.

You see, there’s a big difference between data warehousing and the data lakehousing.

Data warehouses are what businesses demand; analysts formulate, architects design, engineers build, and project teams deliver. They are created in the real world in response to practical requirements and provide tangible business benefits. It’s a coherent, rational, and well-engineered approach to providing data to support decision-making.

On the other hand, data lakehouses (like the big data and data lake tripe that preceded it) are what management consultants design, build and deliver. Using PowerPoint slide decks, inflated invoices and incoherent explanations.

The concept of a data lakehouse is a vague, sloppy and incoherent construction in the minds of flimflam artists. In essence, I mean, just look at the proponents. It’s smoke, mirrors and voodoo data management concocted by the mindless purveyors of vapour-ware. It’s a whole series of pipes, promises and black boxes that all hide the “magic” and “enchantment” of the solutions. However, in reality, they are not built in the real world by anyone who truly understands the real world. And hopefully, they will die in the business netherworld of disgrace, ignorance and misery, together with the aspirations of the ignoramuses that pimped them. That is, before some moronic jackasses in the business IT world try to adopt them.

That’s the difference. Put it this way: the data-lakehouse crowd hail from the outer rings of Uranus and the red dust of Mars. At the same time, the modern data warehouse faithful are rooted in the commuter belts of New Jersey, the City of London, and the boulevards of Madrid. Real people versus the fever dreams of viral space cadets.

I suppose what I most detest about the data lakehouse proponents is that they appear to be utterly ignorant of the subject matter at hand. And even more so? Supinely and unashamedly mendacious, insincere and two-faced in how they go about schlepping their wares. 

There, I said it!

I hope this is only about “a little knowledge is a dangerous thing.” The alternative would be a damning indictment of a crucial aspect of the IT industry.  

But, I digress.

Today, we have the architectures, methods, technologies, and products to make 4th-generation data warehousing work effectively. We offer comprehensive solutions, including templates, roadmaps, and blueprints, for data integration and complete coverage of reporting and decision support. We can even support contemporary statistics and statisticians, as well as data scientists from Rubbish Shop, Poundland, and Weatherspoon’s, and whatever they get up to. If they have the strength, wit and knowledge to get up to anything.

So, to cut to the chase. Here’s a message to the data lakehouse and big data jesters. Your garbage didn’t work, isn’t working, and won’t work. Your meter is running out of money; you are out of dimes, and your parking ticket is being prepared. You are not convincing anyone worth convincing. You are not making a constructive, coherent or valuable contribution. Your big data bullshit keeps on coming, but there’s no one at home to take delivery.

Why?

Because data warehousing is evolving, not because of big data, Hadoop or anything that came out of that half-baked ecosphere. It’s changing due to significant advances in technologies and products that support real enterprise-class data warehousing, together with marked improvements in licensing fees, upfront costs, and costs of ownership. These developments are mitigating the impact of significant constraints, barriers, and dependencies in the real-world data warehousing environment.

So, go away, data lakehouse enthusiasts, and let the professionals —knowledgeable and experienced adults in the room —deal with the actual data integration, architecture, and management issues. The real business challenges, real opportunities and the things that matter. And not the self-absorbed, pretentious and unservable dreck that you guys use to muddy the waters.

So, I’ll leave you with this. To paraphrase Mel Brooks, “If I were the data king. I would declare that from now on, and all through the land, that data lakehouses be known as bullshit lakehouses.”

Thank you for reading.

About the Author

Martyn Jones is among the world’s foremost authorities on data integration, modelling, architecture, management and data protection and governance. In the early eighties, he defined and built some of the first Information Centres in Europe at Sperry Corporation. They were classic Inmon data warehouse architectures and achieved considerable success.

Martyn’s 2020 book, Laughing at Big Data, offers a refreshing insight into contemporary IT and data.

Martyn blogs at goodstrat.com and can be contacted at martyn.jones@goodstrat.com


Discover more from GOOD STRATEGY

Subscribe to get the latest posts sent to your email.