Tags

, , , ,


The data warehouse is the place to copy exhaust data to

Martyn Jones, Bilbao, 2nd October 2024

Narrator: Not all data we have is related to strictly business domains such as products, organisation structure and corporate real estate. A lot of data we collect is simply about monitoring all aspects of IT, applications, networking, security and governance. To name just a few.

Dud: What’s all this data in my exhaust? Is my data back-end going well, or do I have a mechanical data governance issue? All this OLTP exhaust data is so tedious and tiring.

Pete: How often have I told you not to use sunflower oil in your car? It’s false economics, Dud. And the taxman will have ye.

Narrator: The way many data mesh, data lake and data fabric talking-heads view modern data warehousing as just a place to dump and report on data is quite problematic in its naivety, lack of rigour and petulant frivolity.

To be brief, this assertion about data as exhaust fumes is a misreading of the past. It demonstrates an ignorance of data, its management, and its architecture.

So, to be clear, data wasn’t invented around the time of the birth of Windows 8, Hadoop, TikTok or the iPad. Data has existed for a long time, even before computer systems. Still, a twinkle in the eyes of the pioneering mums and dads, but admittedly, well after the dinosaurs went AWOL.

What some folk need to clue into is the fact that data warehousing has understandably borrowed from many areas of data management (digital and non-digital), including in areas such as:

  • The subject orientation of data. We’ll take a more detailed look at this further on….
  • Distributed data processing
  • Time slicing, time-variance, and time series, as well as time-invariant data
  • Iterative development and delivery
  • Information Centre architectures
  • Database analysis and design
  • Data migration tools and techniques
  • Function decomposition and business data domains
  • Joint application development and rapid application development
  • Reusable designs
  • Timebox methodologies
  • Decision Support Systems and Executive Information Systems
  • End User Computing
  • Entity relationship modelling and dimensional modelling
  • Relational database management systems
  • MPP, SMP and hybrid SMP platforms
  • In addition, there is a longer list of notable technological contributors.

Pete: Why did data become exhaust data as soon as we went digital? Well, it didn’t. It’s just that the silly, the provocative and the vacuous all trump facts, reason and proportionality these days.

Dud: You’re not wrong there, Pete.

Pete: As I told you, Dud. Let’s reflect for a while.

Hello, ma and pa!

Narrator: Whilst Data Warehousing has borrowed from things everywhere, Bill Inmon, the “father of Data Warehousing”, defines it as being “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.”

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture) and physically (detailing how it is implemented) by subjects/data domains of interest to the business, such as customer, product and sales. 

The thing to remember about subject areas/data domains is that they are not created ad-hoc by IT according to the sentiments of the time, e.g., during requirements gathering, but through a deeper understanding of the business, its processes and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant:  Time variance allows us to view and contrast data from multiple viewpoints over time. It is an essential element in data organisation within the data warehouse and dependent data marts.

Non-Volatile:  The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely, if ever, modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

Demand-driven: addingdata to the data warehouse is based on business demand for that data and NOTHING else. Pre-emptive data loading, just in case, should be avoided, like the plague or COVID-19.

Initial Conclusion

Narrator: Data Warehousing is about more than dumping operational data elsewhere and letting people stick reporting, dashboard, analytics and AI tools on top of it.

Dud: I’m glad we cleared up that problem, Pete. It’s been keeping me awake at night.

Pete: You may well be glad, Dud. It has been an enigma, wrapped in a riddle and hidden in plain sight of the day. It is a challenging monument to mystery and suspense. The journey has been extended, arduous and thrilling, but ultimately worth it.


Discover more from GOOD STRATEGY

Subscribe to get the latest posts sent to your email.