, , , , , , , , , , , ,

Let’s get this baby off the ground

This weekend I read a piece on the Information Management website by Steve Miller with the title of Big Data vs. the Data Warehouse. It’s an old piece, from March 2014.

It was in response to a piece penned by Bill Inmon, titled Big Data or Data Warehouse? Turbocharge Your Porsche – Buy an Elephant, in which he singled out for criticism the ad campaign of a big-data and Hadoop promoter.

My initial thought was, what a great comeback to the ‘Inmon camp’, it’s about time we had a workout.

It’s great. Because it is so overwhelmingly tendentious and skewed I was sorely tempted to read it back to front, from bottom to top.

I am a bit surprised that no one took the bait and challenged the piece. But to be fair, the commenting system on the site didn’t work for me, which probably meant that it didn’t work for other people either.

So the ‘prejudiced and controversial’ never got directly addressed.

Which is a pity.

Although I can imagine why Bill might have given this one a pass.

So, what to say?

When you make or promote advertising claims that are clearly hyperbole, disinformation and bullshit then it is only fair that one gets called on it.

When one frivolously conflates and mixes up the terms data warehousing, Hadoop and big data, then one is just begging for a rebuttal.

I know we live in a predominantly post-modern world where anything can mean anything, but it is a generalised malaise that undermines any rational, informed and professional approaches to information management.

It’s like what Chris Rock said one time. Well I won’t repeat what he said for now, so as not to offend the sensitive.

Some readers may want to stop reading this piece at this point.

You can build a build a data store in Excel, VB script and PowerPoint if you want. It doesn’t mean it’s a Data Warehouse and it certainly doesn’t mean it’s a good idea.

Much the same way that Chris stated that “You can drive a car with your feet if you want to; it don’t mean it’s a good fucking idea!”

It’s curious how some people just hate the idea of admitting that the best way to go about data warehousing is pretty much along the lines of what Bill Inmon was saying, decades ago.

And time and time again we get stuck with propositions that aren’t going anywhere any good, or anytime soon.

Sure you can dress in a clown’s outfit and walk into town on your hands, and sure you can collect a bunch of data together and call it data warehousing – even if it isn’t and it’s not.

How much more of your organisation’s assets, resources and opportunities are you willing to fritter away with that ole bullshit?

That is, until the epiphany happens, again.

Then eventually people reluctantly come back to the idea that it may be necessary to have a well-engineered Data Warehouse in order to do effective data warehousing.

So, let’s check the market for spin, disinformation and hyperbole.

What I notice is that many fickle pundits will jump from one ‘greatest thing since sliced bread’ to another.

I remember when one particular idiot used to run around actively encouraging people to build a surfeit of Data Marts. No need for a Data Warehouse. They’d sold their integrity in exchange for ‘collaborations’ with certain vendors. Vendors who just didn’t have the technological capabilities to support Enterprise Data Warehousing.

So they sidelined Data Warehousing, and just did what they could, within the limitations of their products.

Many people believed the shtick, and many organisations soon ended up with a collection of unmanageable, costly, highly-constrained and badly engineered silos of data.

It was a mess, a mess that was promoted by people who claimed that they knew what they were talking about.

And they didn’t, or they lied… whatever…   Being a whizz with SQL doesn’t make one a sound data architect.

Many years ago I was building Information Centres for UK and Irish based clients of Sperry Univac. Real practical hands-on project work of building and delivering the precursor to Data Warehousing.

When IBM got wind of it, they jumped on the bandwagon. And the first thing they did was to write papers, as if they had come up with the idea in the first place, which was never certain.

It wasn’t the fault of IBM. A couple of opportunists in that organisation saw the chance to hitch a ride on a new trend, even though they didn’t actually originate it or contribute anything original or creative to it.

Data Warehousing has been plagued with products and services that have only a passing acquaintance with sound data warehousing principles.

Worst still, data warehousing is plagued with non-practicing ‘practitioners’ who will endorse products and services that by necessity require Data Warehousing to become things that it should never be.

Sure, there a million ways to make a buck, but this is really sacrificing the golden calf for immediate pecuniary satisfaction.

So let’s take a look at some of the tendentious and polemic claims:

There is a hybrid Inmon-Kimball approach?

Actually there isn’t. The Kimball approach is as close as dammit to the Inmon approach, albeit with a different modelling approach to the data warehouse, and a different business approach to the requirements gathering. It’s the Inmon approach, with a bit of go-fast styling. A bit like Starsky and Hutch’s ride.

Attacking the bizarre claims of vendors is not the way to go

For anyone with even an ounce of professional self-respect, standards and ethics should appreciate, calling bullshit on bullshit is exactly the way to go. The more we cultivate best principles. Rational appreciation and health disrespect for charlatans, chancers and wide-boys, the better the IT industry will be.

The Unkown Knowns

Donald Rumsfeld was much celebrated for his “known unknowns” and “unknown unknowns”, even though they logically made sense, even if tied to thoroughly strategy-free strategies.

Just to recap, here’s a quote from the aforementioned piece:

To suggest that big data cannot encroach on traditional data warehousing is quite naïve, just as was the claim by network and hierarchical database vendors 30 years ago that new relational technologies could never meet business processing needs like the entrenched. What do you hear about 1985 database stalwarts IMS and IDMS today?

What to say?

30 years ago, commercial relational database technologies were still in their infancy.

In 1984, for example, Oracle was primarily used for reporting and analysis. Dumping data off mainframes and onto Unix boxes or DEC boxes, as is – same or similar database models, for reporting and analysis.

There was no way that RDBMS could even think of competing with well-established DBMS for safe, fast and high-volume transaction throughput. Heck! Oracle 3 didn’t really have a reliable audit trail facility.

Why? Because it wasn’t needed for reporting apps.

Even in the 2000’s the RDBMS on Unix/Linux boxes still couldn’t outrun IMS on proprietary platforms, and by a long shot.

Sure, people have been using RDBMS for quite some time for transaction processing, but it’s not really its forte, and it was never designed to be.

RDBMS made application development easier, far less complex, far less an engineer’s task, but it far from being the optimal solution when high-volume, high-throughput and absolute transaction guarantees are necessary.

Many of Europe’s big banks and financial institutions still use IMS for their core banking, even though they will use RDBMS for Data Warehousing and peripheral systems.

What does that tell us?

Facebook, Yahoo and Netflix

The idea that these guys matured and progressed from Data Warehousing to Hadoop and Big Data is absolute nonsense.

What these guys had was a problem that is best resolved using big data and Hadoop, this doesn’t mean that Data Warehousing outgrew its usefulness, but that Data Warehouse was never the right tool for what they wanted to do in the first place.

It was never about moving on up, but moving to more appropriate tools.

Can you imagine any of these big guys using anything other than Data Warehousing and BI for their key financial reporting, analysis and visualisation? Would they use Hadoop, deploy big data, or do the right thing, right when it comes to strategically important financials?

Data Warehouse Failures

Data warehousing projects that fail, fail because of people.

Data Warehouse projects fail when people believe that they don’t need any of that theoretical nonsense, as they know intuitively what Data Warehousing is all about.

The sort of intuitive stupidity that leads to the conflation of Data Warehousing, Big Data, Hadoop and “Just stick all that data over there, mate!”.


You can put all the tiny scraps of bad advertising and worse propaganda about Data Warehousing, Hadoop and Big Data, and it still only makes a thick whale omelette of disinformation.

In my humble opinion, Bill Inmon was absolutely right to call BS on the Data Warehouse vs Big Data conflation.

As he told me once: every year, the idea that you can do Data Warehousing without a Data Warehouse does the rounds, just like the flu.

Every year we just have to keep our chin up, and fight the good fight.

Because that’s what knowledgeable, experienced and ethical professionals with standards do.