Martyn Richard Jones

San Martiño de Bandoxa

I wrote a polemic piece on the concept of a Data Lakehouse, to promote my new Amazon ebook, Laughing@BigData. And although it was well received, I have reflected and taken the time to revisit my position and to flesh out and contrast my positive and negative views.

Call it a COVID-19 bonus of sorts.

Yesterday evening I was driving down to the nearest garage to get a six-pack of the local ale (Estrella 1906) and it suddenly dawned on me. An epiphany! Hallelujah! It was if I had been given a great and insightful gift by the absolutely-fabulous witches of ancient Celtic Galicia.

And what was that gift? Without further ado I present you with: The Three Magic Squares of the Data Lakehouse.

This image has an empty alt attribute; its file name is picture5.png
The Three Magic Squares of the Data Lakehouse

So what story is this diagram telling us? What’s the narrative? What’s the low-down?

There are three major and necessary features of the Data Lakehouse get-and-put architecture. Let me explain:

  1. Data “out there,” and “in here.” – This magic-square is the data we either have in the business, on-prem or in-the-cloud (public, private or hybrid). Or data that we can get from a third party data supplier, such as a telco, bank, consumer association or retailer. We can pick-n-mix this data later on.

Takeaway: here is where we find the data.

  • Data we have and know that we have. – This magic-square is the data that we have direct access to and is usually all in-house or on an external cloud. It is also the data that we know we have.

Takeaway: here is where we have the data.

  • What we do with data to make more data. – This magic-square is us using data that we have and we know about to generate more data that we can use, store or discard.

Takeaway: here is what we do with the data to make more data.

What you’ll notice about this fascinating and innovatively new architecture is that data flows in both directions. Full duplex and either synchronously or asynchronously. That is, away from the source of the data. And back to the source of the data. It’s quite amazing in its simplicity, elegance and deeper meaning.

And, to be honest, this is probably only the second time I have seen such an amazing paradigm.

The first time I came across such a model was when I was working for Sperry Corporation and studying the entire fecking history of computing. The unique difference of course is that these days we are encouraged to use some half-baked and half-chewed assortment of dime store (that’s like Todo-a-cien or Poundland) techno-choccies. The sort of thing schlepped by folk with a passing knowledge of data to folk with a passing knowledge of business.

But never mind. The more things change the more they stay the same. And we still treat lessons-learned like anathema.

Make of that what you will.

Thank you for reading.

About the Author

Martyn Jones is among the world’s foremost authorities on data (including data integration, modelling, architecture, management and privacy.) In the early eighties, he defined and built some of the first Information Centres in Europe at Sperry Corporation. They were classic Inmon data warehouse architectures and met with a lot of success.

Martyn’s 2020 book, Laughing@BigData, offers a refreshing insight into contemporary IT and data.

Martyn blogs at and can be contacted at

His new ebook is titled Laughing@BigData (Kindle Edition) and you can take a look inside for free – it is now available at the following Amazon country sites:

United Kingdom: