To begin at the beginning

I am quite a fan of many aspects that sit under the Data Mesh umbrella. However, when it comes to a proper fact-based understanding and analysis of the history, place and architecture (business, data and technical) of Data Warehousing, the leading exponents of data mesh have it woefully wrong.

Therefore, the purpose of this blog article is to set the record straight.

The data warehouse as a place to copy OLTP exhaust data to?

The way many of the data mesh, data lake, and data fabric talking-heads view modern data warehousing as being just a place to dump and report on data, is quite problematic in its naivety, lack of rigor, and frivolity.

To be brief, this assertion about data as exhaust is a misreading of the past and demonstrates an ignorance of data, its management and its architecture.

So; to be clear, data didn’t get invented around the time of the birth of Windows 8, Hadoop, tik-tok or the iPad. Data has been around for a very long time, even before computer systems were but a twinkle in the eyes of the pioneering mums and dads, but admittedly well after the dinosaurs went AWOL.

What some folk need to clue into is the fact that data warehousing has understandably borrowed from many areas of data management (digital and non-digital) including in areas such as:

  • The subject orientation of data
  • Distributed data processing
  • Time slicing, time-variance and time series as well as time-invariant data
  • Iterative development and delivery
  • Information Centre architectures
  • Database analysis and design
  • Data migration tools and techniques
  • Function decomposition and business data domains
  • Joint application development / rapid application development
  • Reusable designs
  • Timebox methodologies
  • Decision Support Systems / Executive Information Systems
  • End User Computing
  • Entity relationship modelling / dimensional modelling
  • Relational database management systems
  • MPP, SMP and hybrid SMP platforms
  • In addition, there is a longer list of notable contributors.

Hello, ma and pa!

Whilst Data Warehousing has borrowed from things everywhere, Bill Inmon, the “father of Data Warehousing”, defines it as being “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture) and physically (detailing how it is implemented) by subjects / data-domains of interest to the business, such as customer, product and sales. 

The thing to remember about subject-areas / data-domains is that they are not created ad-hoc by IT according to the sentiments of the time, e.g. during requirements gathering, but through a deeper understanding of the business, its processes, and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant:  Time variance gives us the ability to view and contrast data from multiple viewpoints over time. It is an essential element in the organisation of data within the data warehouse and dependent data marts.

Non-Volatile:  The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely if ever modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

Demand driven: adding data to the data warehouse is based on business demand for that data and NOTHING else. Preemptive loading of data, just in case, should be avoided like the plague.

Conclusion

So Data Warehousing is about far more than dumping operational data elsewhere and letting people stick reporting and BI tools on top of it.

Relational database management systems were first used for OLTP?

If anyone claims that relational database systems were first used with operational applications, treat everything they say with extreme caution.

The first RDBMS products got used for reporting needs, such as reporting on data in databases designed using dimensional modelling. There was a good reason for this as none of the implementations even came with a usable audit trail facility. So, not at all OLTP friendly at the beginning.

Data Warehousing necessarily means monolithic databases?

Another false meme doing the rounds is that Data Warehousing necessarily means monolithic databases. This is not in fact what data warehouses have been for many businesses.

Part of the problem with this claim is that even the term monolithic is being corrupted to mean “any architecture that I don’t like”.

Having worked with massive clusters of computing power, with tons of nodes and disks and memory and ultra-fast mesh communications backplanes, I instinctively know this monolithic labeling is dubious. In addition, we have had the ability to isolate data at various levels of abstraction. Indeed, we can organise singular and interdependent subject/data domain areas using database technology that has been available for decades. So distributed data storage and compute is nothing new and certainly nothing new to data warehousing.

So, the notion that “data mesh is an emerging architecture that establishes an alternative, de-centralized pattern to the data warehouse,” is basically bullshit born of ignorance and immaturity.

Data Warehousing necessarily means monolithic and siloed teams?

This isn’t a problem with data warehousing this is a problem of how IT companies and their customers reframed the idea of data warehousing development and infrastructure.

Before IT got its grubby mitts on data warehousing the initiatives in this solution space were highly dynamic and focused, and consisted of small multi-disciplinary teams working in close conjunction with the business. The data warehouses would be populated and the data marts would be built, on-demand, in chunks that were small enough to be doable in an iteration and large enough to have business significance.

Don’t lay the blame for the demise of this approach on data warehousing, blame it on the vendors and the big “systems integration” providers focusing on revenue funnels.

Data Warehousing technology is just about databases?

Another rather silly assertion is that data warehousing technology is just about databases. Yes, an industry expert on data lakes actually said this in a podcast on Software Engineering Radio. Avoiding the fact that data warehousing technology includes:

  • Communications and connection technology
  • Security technology
  • Extract, Load and Transform / Extract, Transform and Load
  • Data cleansing / data quality tools
  • Meta-data management and cataloguing (data governance)
  • Scheduling and keep-alive technology
  • MPP, SMP and Hybrid SMP
  • Database management systems
  • Distributed file systems
  • Business intelligence, advanced analytics tools, visualization tools, dashboard builders, and so on.
  • Etc. Etc. Etc.

Data warehouse databases must be fully normalized?

This is yet another problematic meme that the data mesh folk are spreading. Data warehouse data models were typically modeled using a flexible class of third-normal form modeling. Data marts, on the other hand, were (and still are) modeled as dimensional schemas. There was never a movement to have everything in the DW model in 4th, 5th, or 6th normal form. This is simply untrue and not even true for Data Vault 2.0 models. As for the mention of DKNF, clearly, people are just making lazy stuff up about data warehousing rather than doing a bit of research.

Central ownership of data?

Yet another ill-informed claim from the data mesh folk. They really need to brush-up on their history, or correct their total lack of a historical perspective.

For one, there was never a generalized centralized ownership of data even if there was central custodianship of data.

Data warehouses get queried directly ?

Data mesh folk believe that data warehouses get queried directly

What in the name of Sam Hill do they think Ralph Kimball and others were doing with all of those dimensional models and data marts? Trolling Bill Inmon? Digging for nuggets of gold? Making the partridges dizzy?

Data warehousing is out of date?

According to some data-mesh folk, data warehousing is “a data management construct that dates back to the 1980s” And as if that was somehow a bad thing. For me, that’s a specious argument that actually treats people like idiots.

It’s like someone saying to Isaac Newton, “so, Sir Isaac, you don’t still believe in that old gravity nonsense do you?”

What they also don’t realise is that their beloved distributed computing and data paradigm-shifter is older than the oldest data warehouse.

That’s it folks!

To wrap up. There are many appealing aspects of data-mesh. Probably because I have come across a lot of it before. In addition, its proponents are in the main reasonable, considerate and informed people – unlike some of the wide-boys of big data. However, what rankles are the gratuitous pot-shots being taken against data warehousing by people who are basing their assumptions and declarations on vague, inaccurate and weak evidence. We saw it with Big Data and Hadoop, then data lakes and lakehouses/outhouses, and now with data mesh.

It is like the fake-news of data; irritating, time-wasting and unnecessary. I do however hope that the data mesh folk rectify their view of data warehousing.

Many thanks for reading.

Martyn Jones

Cambriano Energy