Martyn Jones

Madrid 25th January 2017


HERE’S A THOUGHT! In the past age of big iron, the only reason for not having a proliferation of duplicated databases was the prohibitively high cost of disk storage. Now, we can let our database collections grow ‘like Topsy’. This is the Brave New World of data… apparently…

Let’s, for one moment, be the fly-guy on the corporate wall. What are we picking up on the radar there, Enda? Oh, yes…

“So, Nigel, what corporate databases will we be accessing and affecting?”

“Well, Bob, nothing that already exists, we don’t need to. We’ll just create a new database for every new application that comes through the epic-story-tittle-tattle pipeline. After all, when you guys were young, around the time the major dinosaurs were evacuated from the planet, disk storage was really, I mean really expensive, and you had to do all sorts of weird shit, like create byzantine codes and Aramaic keys and reverential integrity and all that jazz. But now it’s as cheap as chips. So, heck, if keeping multiple copies of data is good for Hadoop, and who am I to question that revolutionary-evolutionary strat, then it’s good for us too!”

“You have no use of a centralised customer database? Aren’t you worried about side-effects and unintended consequences?”

“Don’t be daft, Bob. That’s why we have a Chief Data Officer, Data Integration software licences up the wazoo and the Kattupalli Village Centre for Advanced Strategic Scientification and Datarisation.”

“I certainly admire your courage, strength and indefatigability, Nige”

“I’m extreme envelope-pushing agile, Bob. That’s where it’s at.”

“Okay then, Nigel, but remember these words of wisdom: never eat more than you can lift.”

HERE’S A THOUGHT! Forest Jamonson was employed by the Central Statistical Office to analyse data quality in wartime military operational databases and to come up with a way to better protect the decisions that were affected by bad data quality.

After many months of wearisome and convoluted research, Forest was able to identify the erroneous data and to then correct that data. Happy days were here again.

Unfortunately not all of the data could be corrected, as this would prove to be prohibitively expensive, so just the ‘bad’ data was corrected. The ‘good’ data was left, ‘as is’.

The Ministry of Defence were now in a position to respond to any threat on any of the countries flanks. At least, that’s what they thought.

Then the dreaded day arrived. Things went tits-up. Even though the data was ‘perfect’.

Why hadn’t the data helped?

It wasn’t bad data.

After all, it had been scrubbed and scrubbed until it was ‘right’.

Even the data quality of the not-so-flawed data had improved.

It was the rest of the data. The data that wasn’t flagged as dodgy or erroneous or bollox, because, it was consistent and believable and nice, and proved itself to be correct in the world in which it came to life and existed.

But, it’s funny how the analysis of social media streams doesn’t always tell us what we want to know. Data quality or no data quality.

And that’s how the Empire was lost. Through a high-minded belief in the value of Big Data flotsam and jetsam. Or, maybe not.

Many thanks for reading.