Tags

, , , ,


If in doubt, blame ‘data quality’

Narrator: It doesn’t matter what you are trying to do, or even if you know what you are trying to do, but if in doubt, blame the quality of the data. There was a time when data quality was a massive impediment to business data integration.

Data quality assurance was a glint in IT’s eye. A far-away place that nobody knew how to get to.

Dud: Do you remember Executive Information Systems, Pete?

Pete: Do forest dwelling bears do data analytics in the Vatican, Dud?

Dud: Well, you know. The tools were quite flashy for the time. This was the late eighties and early nineties.

Pete: Quite right, Dud. However, some critical problems contributed to their relatively scant success.

Dud: And what were they, Pete?

Pete: Data availability and data quality. They were challenging problems to overcome.

Dud: I’m all ears.

Pete: When I first entered the computing lark, data was a significant issue because of storage constraints. Back in the day, removable disks and what we called disk drives were far from being commodity items. And an eighty-megabyte disk was a significant cost. So, we did clever things with data coding to reduce the storage requirement footprints of substantial applications. We used “0” instead of “male”, and “1” instead of “female” and nothing instead of something that we could discard. We used “r” instead of “red”, “g” instead of “green” and “w” instead of “white”. We used “USD” for “US dollars”, “GBP” for “British pounds”, and “DM” for “Deutschmarks”. We even used shortened dates and times.

Dud: Wasn’t that problematic?

Pete: I should say so, Dud. First, we didn’t stick to the same abbreviations on all systems. The definitions very rarely agreed, and the codes certainly didn’t. It was all over the shop. Which was okay until we started to want to integrate data from two or more application platforms. Then things became wickedly fickle.

On top of that, Dud. In most places, whenever the business bought or developed a new application, it would get a new box to run it on. It was mental, Dud.

Dud: So, massive data mismatches all over the shop.

Pete: You can say that again. Integrating and reconciling data was a nightmare that frequently resulted in woefully inadequate and misleading results. But all was not lost. Some intelligent people saw the need for tools that could be used to clean up, consolidate and integrate data. So, the market was opened up to technology that could go some way to achieve that. Basically, we had tools for cleaning up, matching and integrating data. We had tools for removing duplicates, identifying outliers and anomalies, and balancing the data books. Some products were good, some were less good, but the technology worked and was shown to be helpful. It was welcomed by believers everywhere in data governance and quality assurance.

Dud: Horray! So a win!

Pete: Well…

Dud: Then what?

Pete: Several tools that complemented ETL technology came on the scene. People would buy them, have them in their software libraries, and sometimes even try to install them. This was sold as “doing something about data quality.”

Dud: So, progress at last.

Pete: Not really, Dud. It was activity without purpose, acquisitions without goals and gestures without value.

Dud: Meaning?

Pete: Very few teams had the will to deploy and use these technologies to improve data quality, so the data quality for things like data warehousing and analytics didn’t improve. In fact, it was worse because of the widening gap between hyperbole and reality, resulting in further alienation of business and IT. In fact, it was even worse, Dud. Many failed to implement a data quality aspect of their data pipelines. Still, many failed to grasp the importance of a data quality pipeline. Others were clueless about data quality, its technology’s many facets, and how to put together some semblance of criteria and a list of tangible priorities.

You see, Dud, many people claim that data quality is essential, but many people are also prevaricators when it actually comes to doing something about data quality.

It’s like this, Dud. How many scrum boards or project plans mention data quality and set aside time and resources to address the issues? I guess “very little if at all”, Dud.

Dud: Well, IT has been bellyaching about data quality for decades, Pete, but when push came to shove, they did very little about it, right, Pete?

Pete: Exactly, Dud. Although to be fair, IT departments in the industry, not the IT industry, are like boys in short trousers for as much money as they might have. They are basically well-intentioned, have limited experience and are clueless, Dud.

Dud: The more things change, the more they stay the same.

Pete: If in doubt, blame the data quality, Dud.

Narrator: In conclusion. Blaming data for all the ills of data-centric business analytics and reporting is like an incompetent writer complaining that dictionaries render their work meaningless, that pens have a mind of their own, and that computers are for wimps.

As the saying goes, “A stupid man’s report of what a clever man says can never be accurate because he unconsciously translates what he hears into something he can understand.”

You can thank Bertrand Russell for that gem. He could have been talking about the data mesh massive.