Martyn Richard Jones
San Martiño de Bandoxa
15th April 2020
ADVERT:
LAUGHING@BIGDATA – THE GREATEST DATA STORY EVER TOLD!

A new ebook about Agile, AI, data, deep learning, IT, machine learning and more.
It’s highly polemic, contrarian and insightful. It informs, educates and entertains. And there’s a lot of it. You won’t be left indifferent.
Here’s an update on developments.
For greater convenience my Brand new ebook Laughing@BigData (Kindle Edition) is now available at the following Amazon locations:
USA (around 9.98 USD): https://www.amazon.co.uk/dp/B086HS6VWX
United Kingdom (around 7.99 GBP): https://www.amazon.co.uk/dp/B086HS6VWX
Germany (around 8.99 EUR): https://www.amazon.de/dp/B086HS6VWX
France (around 8.99 EUR): https://www.amazon.fr/dp/B086HS6VWX
Spain (around 8.99 EUR): https://www.amazon.fr/dp/B086HS6VWX
Italy (around 8.99 EUR): https://www.amazon.it/dp/B086HS6VWX
Netherlands (around 8.99 EUR): https://www.amazon.nl/dp/B086HS6VWX
Japan (around 1,099 YEN): https://www.amazon.co.jp/dp/B086HS6VWX
Brazil (around 24.99 BRL): https://www.amazon.com.br/dp/B086HS6VWX
Canada(around 9.99 CAD): https://www.Amazon.ca/dp/B086HS6VWX
Mexico (around 149.99 MXN): https://www.amazon.com.mx/dp/B086HS6VWX
Australia (around 10.99 AUD): https://www.amazon.com.au/dp/B086HS6VWX
India (around 449 INR): https://www.amazon.in/dp/B086HS6VWX
Please consider sharing these links and a recommendation with friends, connections, groups, colleagues, partners, peers, family and bosses.
Oiling the wheels-of-industry during COVID-19.
Thanks a million! Stay safe and keep well!
Martyn
Move over big data hubris and data lake stupidity there’s a newer, thicker and far bigger arsehole on the block. And it goes by the unbelievably idiotic name of data lakehouse. It is being hailed as a new paradigm but is, in reality, a naive, dishonest and disruptive fraud. So what’s occurring?
The gutter-snipes, hustlers and useless pundits who failed to make big data and data lakes the success of the 2010s have set their vulture-eyed sights on data warehousing. It’s not smart, it is not funny, and it does no one any service.
So, what in the name of Sam Hill makes these snake-oil merchants engage in this crass, irresponsible and reckless nonsense? And why do they insist on targeting data warehousing?
Why? Because it is their unique differentiator. And snake-oil merchants are, well, snake-oil merchants. And because these pundits have placed so much faith in big data technology that they simply can’t let go. Big data technology is their comfort blanket. So, they are using a new angle to try and flog technology very few people need. To solve the challenges that they don’t have. And using data that has no intrinsic value. Clearly, they understand the technologies even less than they understand the issues and opportunities that need to be addressed. And they certainly don’t understand the data.
Anyway, these jokers will be sure to fail again as they have failed so miserably in the past. Because physics is physics and facts are facts and sows ears will never be silk purses. No matter what the bullshit magic-quadrants, the virtuous-circles or the hype-cycles say.
However, “Yes, they say. A data warehouse is all very well for structured data, but what about all of that unstructured data and semi-structured data that companies have?” Dudes! We have Textual ETL for that! And guess what, in decision making and relevant data terms? Your most valuable data is still in your operational systems. If that isn’t highly structured, then heaven knows what your operational systems look like. But, seriously, the big guys analyzing unstructured data as part of their business model are very few and far between. And most others don’t need it. And you can take that to the bank.
Then again, some folk fret about data, technology and brands. Like as if data warehousing can’t hack data related to brands? And you’d need Spark, data streaming from social media and the Hadoop ecosphere to make that magic sauce work. But here’s the rub. All this talk about the importance of brands, online interactive advertising and understanding sentiment is bullshit. Or as Bob Hoffman put it “You’re passionate about BRANDS? Dude, get a f***ing girlfriend.” So, data lakehousers, you’re poor losers on that point too.
Seriously, folk. I wish I could have a good word to say about data lakehouses and their proponents, but I don’t.
The lakehousers’ genuinely imprudent ideas about data architecture, engineering and management have been resurrected from the remnants of yellow elephant and its dodgy ecosphere. And again the big data twits are targeting data warehousing to take it down and replace it with their poor, absurd and ultimately unimplementable vision of a data dystopia. But still, one more time, in their ignorance, arrogance and lack of depth, they are wrong
You see, there’s a big difference between data warehousing and the data lakehousing.
Data warehouses are what business demands, analysts formulate, architects design, engineers build and project teams deliver. They are made in the real world, as a response to practical requirements and they provide tangible business benefits. It’s a coherent, rational and well-engineered approach to providing data to support decision making.
On the other hand, data lakehouses (like the big data and data lake tripe that preceded it) is what management consultants design, build and deliver. Using PowerPoint slide decks, inflated invoices and incoherent explanations.
The concept of a data lakehouse is a vague, sloppy and incoherent construction in the minds of flimflam artists. In essence, I mean, just look at the proponents. It’s smoke, mirrors and voodoo data management concocted by the mindless purveyors of vapour-ware. It’s a whole series of pipes, promises and black boxes that all hide the “magic” and “enchantment” of the solutions. But, in reality, they are not built in the real world by anyone who knows the real world. And hopefully, they will die in the business netherworld of disgrace, ignorance and misery. Together with the aspirations of the ignoramuses that pimped them. That is, before some moronic jackasses in the business IT world try to adopt them.
That’s the difference. Put it this way, data lakehouse users are from Uranus and Mars, and contemporary data warehouse users are from New Jersey and Chicago. Real people versus the fantasies of viral space cadets.
I guess what I most detest about the data lakehouse folk is that they appear to be utterly ignorant of the subject matters at hand. And even more so? Supinely and unashamedly mendacious, insincere and duplicitous in how they go about schlepping their wares.
There, I said it!
I hope this is only about “a little knowledge is a dangerous thing.” The alternative would be a damning indictment of an essential part of the IT industry.
But, I digress.
Today we have the architectures, methods, technologies and products to make 4th generation data warehousing work and work very well. We have sound solutions templates, roadmaps and blueprints for data integration and full coverage of reporting and decision support. We can even support contemporary statistics and statisticians, as well as Rubbish Shop, Poundland and Weatherspoon’s data scientists and whatever they get up to. If they have the strength, wit and knowledge to get up to anything.
So, to cut to the chase. Here’s a message to the data lakehouse and big data jesters. Your garbage didn’t work, isn’t working, and won’t work. Your meter is running on empty, you are out of dimes and your parking ticket is being prepared. You are not convincing anyone worth convincing. You are not making a constructive, coherent or a valuable contribution. Your big data bullshit keeps on coming, but there’s no one at home to take delivery.
Why?
Because data warehousing is evolving, not because of big data, Hadoop or anything that came out of that half-baked ecosphere. It’s changing because of seriously significant advances in the technologies and products that there are to support real enterprise-class data warehousing. Together with marked improvements in the licensing fees, up-front costs and costs of ownership. And these developments are removing the impact of significant constraints, barriers and dependencies in the real world of data warehousing.
So, go away, data lakehouse fools, and let the professional, knowledgeable and experienced adults in the room deal with the actual data integration, architecture and management issues. The real business challenges, real opportunities and the things that matter. And not the self-absorbed, pretentious and unservable dreck that you guys use to muddy the waters.
So, I’ll leave you with this. To paraphrase Mel Brooks, “If I were the data king. I would declare that from now on, and all through the land, that data lakehouses be known as bullshit lakehouses.”
Thank you for reading.
About the Author
Martyn Jones is among the world’s foremost authorities on data integration, modelling, architecture, management and privacy. In the early eighties, he defined and built some of the first Information Centres in Europe at Sperry Corporation. They were classic Inmon data warehouse architectures and met with a lot of success.
Martyn’s 2020 book, Laughing@BigData, offers a refreshing insight into contemporary IT and data.
Martyn blogs at goodstrat.com and can be contacted at martyn.jones@goodstrat.com
Martyn,
Interesting opinion and observations. Like you, I dislike “flavor-du-jour” technologies that don’t offer anything new and effectively just recreate and already-existing wheel without much tangible benefit. Too often, these “solutions” are delivered by charlatans who promise the world and deliver far less. Having said that, the world of technology continues to evolve and things like mobile devices social websites and commercial content delivery networks ensure that we are all contributing to the growing data volumes that surround us.
In my view, the rise of data lakes has come about as these data volumes have exploded and much of the data we now care to analyze and work with is near-real-time (NRT) and often in unstructured or semi structured forms. To be sure, there are RDBMS solutions for these challenges -to some degree- but in some of these solutions users receive less-than the requirements or the costs associated with the delivery are prohibitive.
Can you articulate how modern data problems should be solved ? For data scientists interested in evaluating large volumes of IOT exhaust or some other data source producing many terabytes of data annually what do you see as a proper solution?
Finally, I enjoyed this brief essay posted on DataBricks’ blog about a month ago (very timely) discussing some of the details of the lakehouse concept (in their view). You may find some value in their perspectives as well.
https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html
Thanks for your time.
: )
Loren
LikeLike
I’m kinda guessing Martyn is aware of Databricks they are the first Google hit for Data Lakehouse and are clearly trying hard to push this bullshit new term.
Great article Martyn, 10 years of snake-oil merchants booking meetings with those above me trying to pitch ‘Big Data’ and yellow elephants caused me endless stress (HP Haven anyone?)… Can’t believe there is a risk of another cycle of that bullshit, Amazon (We’re customer first blah blah) being the biggest of them all with all their AWS Lake Formation shite. (Codename for build this big complex thing that provides fuck-all value)
LikeLiked by 1 person
Hi Loren,
Many thanks for your comment.
To answer your question “Can you articulate how modern data problems should be solved ?” I would ask what is the problem, what are you trying to achieve and what business want is driving it? What is the answer to “to what ends?”
As for the databricks piece? I read it a couple of weeks ago, and I do have some issues with some of the things that they state, including the conflation of data warehousing and technology, their misrepresentation of 1990’s (they put 1980s) enterprise data warehousing and their simplistic, vague and ambiguous diagram of 2020s Lakehouse (data suppliers –> a black box –> data consumers).
I met Paco Nathan back in the day at a Big Data Conference in Madrid when he was working for Databricks. And although he has now moved on, I was reluctant to name databricks in my piece.
If you have any further questions don’t hesitate to get in touckh. My email is martyn.jones@goodstrat.com
Best regards,
Martyn
LikeLike
Martyn,
I appreciate your concern and invective regarding technology of the day trends and unrealistic claims of capabilities and problems solved.
From a practical perspective, I’ve noticed that Data Warehouses do quite well with enterprise data – information that is mature and known to be of value – especially data from core systems that you cite as being used in important business decisions.
What is your view on speculative data? Often times data science teams need to acquire large volumes (and many 100s or 1000s of fields) of data to evaluate whether it has any value at all. Time frames are often tight and the team cannot wait months. But there is the tempting question as to whether other groups in the company would find value for this information. How do you see modern data warehousing addressing this question?
Thank you for your thoughts.
Sincerely, Neal
LikeLike
Hello Neal,
Thanks for your reply.
I will be addressing this and other questions in my next book on data that I plan to publish in autumn/the fall.
Best regards,
Martyn
LikeLike
Thoroughly enjoyed reading your rant. To answer your rhetorical question, I think for companies of a certain size, the raccoon data management strategy (pretty! shiny!) works. Since things never evolve above the initial chaos, anything – quite literally, anything – is seen as potential improvement.
Data Lakehouses absolutely fall into that pretty/shiny view – and while those of us who have been doing this for a few years understand that this is really a Data Cesspoolhouse with some fancy shutters and colorful paint, we know that it is whitewalls on a garbage truck.
In my mind, it comes down to discipline – if you don’t have any, you’re never going to be able to buy your way to success (though Deloitte doesn’t seem to be going out of business any time soon). Discipline starts at home, and a little can go a long way – you just have to find the right kind of sponsor and work your way out.
This could well serve as a discussion around some of that needed discipline – thank you for your work!
CH
LikeLiked by 1 person
Bill Inmon seems to not think it’s bullshit:
https://databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
LikeLike