Tags

, , , ,


Martyn Richard Jones, Galicia, 14th May 2025.

I’m feeling generous, so I would like to introduce you to my top eight list of foolish things that people say about data.

I think that data does have a role to play in some businesses. I also believe that some of the basic distributed file store and text search technologies used in data can be usefully employed, in non-traditional indexing, counting and correlation. However, there is an awful lot of nonsense said and written about data.

So, onwards and upwards.

Data Is Like Currency

If data is currency, and for most of us it isn’t, then much of it seems more like the hyperinflationary money of the Weimar Republic, rather than something you would take to the bank in your wallet or try and use to buy the weekly grocery. “Oh, I’ve forgotten my hand money and my bank card, can I pay with data, please?”, “No, sod off!”

Data might have value, no doubt some of it does – it can’t all be dross, can it? But, that doesn’t make it a solid financial asset class; that’s just dopey thinking. Okay, okay, it’s a dopey world, but that no excuse.

The value of data providers, such as those companies supplying financial market and instrument data, is in the service of providing accurate, appropriate and timely data. Data has no significantly greater intrinsic economic value or exchange liquidity than pints of beer or glasses of wine. It can have time and place utility, of course, as a product or as a service, but it is not like a central-bank backed currency – not even close.

Data Contains Gold

Call me an old cynic, if you must, but I don’t believe for one moment that we have now learned how to turn lead into gold, or for that matter, data into golden nuggets.

You see, if data contains gold, and it doesn’t, then it would be more like FeS2 than anything else. Or, to use the vernacular, it would be more like fool’s gold than Welsh gold. Which, considering the quantity of hype surrounding data, is the most appropriate analogy.

Data Is the New Oil

Stick it in your car and see how far you can get on it. This myth is so irritatingly erroneous, ill-informed and foolish that it feels more than right to treat it with the utmost contempt. People who utter this phrase with a straight-face as if they were talking about valuable assets you could take to the bank have no shame. Let’s get this straight, alternative energy sources are the new oil, not data, not even data.

Data Is for Everybody

If you want to see what is essential for everyone, then start with Maslow’s hierarchy of needs, a pyramid of fundamental human motivators that curiously hasn’t been expanded to include data. Maybe because it’s not an essential human need.

I would also like to mention that several data projects have nothing to do with data, bricking it, or Hadoop at all.

Data Is Solving Fundamental World Problems

Doing professional taxi-drivers out of a living income is not solving world problems, no matter how many times one chants’ data Uber alle, mate.

We know what the fundamental world problems are, and we know, more or less how to solve them. In this respect, we don’t need data to tell us which way is up. Consider this:

Maybe data can inform us that:

  • One child dies every four seconds.
  • Fourteen children die every minute.
  • A 2011 Libya conflict-scale death toll every day.
  • A 2010 Haiti earthquake occurring every ten days.
  • A 2004 Asian Tsunami occurring every 11 days.
  • An Iraq-scale death toll every 19 to 46 days.
  • Just under 7.6 million children dying every year.
  • Some 92 million children dying between 2000 and 2010.
  • “The silent killers are poverty, hunger, easily preventable diseases and illnesses, and other related causes. Despite the scale of this daily/ongoing catastrophe, it rarely manages to achieve, much less sustain, prime-time, headline coverage.” Source: Global Issues
  • Have you seen the state of Gaza recently?

Just how much data does it take to convince you that there are genocides in progress? Quick and slow.

Maybe if we knew all of this (and we do), then perhaps we can do something about tackling these types of problems.

Maybe we should also be less eager to instrumentalise real suffering to flog ageing technology and price-gouging services. And perhaps we should do something about the problems of those less fortunate than us, using the practical solutions that lie within our reach.

Data Is New

Data is characterised by its volumes, velocities and varieties and by the generation and communication and its various types.

Data volumes have always been growing. And there never has been a time since the invention of the computer that the volumes of data decreased. Data has been generated at increasingly faster rates. A trend that doesn’t look like stopping anytime soon. Also, the varieties in format and content of data objects have been growing since the early eighties – that much I can testify.

Therefore I suggest that we all make a New Year’s data resolution, or whatever month or day it happens to be. From now on, we should only refer to the application and benefits of data and data analytics in verifiable terms, to encourage legality, decency and honesty.

The thing is, nothing of significance is entirely new. Not that this matters. But data has always been about volumes, types and velocity, and those factors didn’t suddenly become relevant at the turn of the millennium. Neither is the technology new, most of the technology labelled as data technology is a collection and configuration of techniques, methods and ideas that are decades old and sometimes centuries old.

Data Will Replace the Data Warehouse

To be precise, this is about variations on the theme of distributed file systems and text search and count (e.g. Hadoop, Data Lake Outhouse, Space Cadet Data Capsule) replacing data warehousing.

Two axioms can be applied when considering Hadoop and data warehousing:

  1. If you can replace your data warehouse technology with Hadoop, then your data warehouse is not a data warehouse in any Inmon (or even Kimball) way.
  2. If you structure your data warehouse data-objects in third-normal-form and your data marts using primarily dimensional modelling, then the most suitable delivery technology is relational. Data technology will not be able to compete with these technologies.

In short, using Hadoop as part of the technology stack for an analytics data store makes absolute sense, but trying to shoehorn a mature strategic and tactical decision support platform into Hadoop is not a good look.

While we are on the subject, I once thought that even staging data with Hadoop was overkill, and added another point of failure in the data warehouse process. These days, I believe that if you are dealing with high-volume streaming delivery of data, it’s an option to be taken into account – that is, Hadoop or one of the follow-on data frameworks.

As for doing ETL with Hadoop and Java? I really wouldn’t recommend it!

Data Is a Panacea

There are legitimate and ethical proponents of data, and indeed some applications seem, at least superficially, to make some good sense. However, there are an awful lot of unscrupulous or wilfully uninformed data punters around, who are, in my view giving the impression that there are no limits to what is achievable with data. As if data were the ultimate tautological universal-panacea.

Of course, it’s not true, it never was, it isn’t now, nor is it likely that this will ever be the case. But this observation has become a cliché these days, for as accurate as it is.

En fin

We have proved that data technology is not only useful for some companies, but it’s essential for them. However, not everyone will have the same or similar business models and drivers as Google, Twitter, Facebook, Netflix and Amazon. Most other businesses are not like these internet advertising companies in any significant way when it comes to the data they collect and the ways that they can apply it. That is, very few other companies are basing their business models on advertising revenue streams, online product delivery, data brokerage and search.

That stated, efficient distributed file stores, text search and word counting will have their place; it’s just that not everyone has this place.