Tags
awareness, Behavioural Economics, Big Data, BS, crap, data analytics, deceit, enterprise data warehousing, history, hustlers, IT business, lies, Organisational Autism, Pimps, spin
What does Big Data have to do with Robitussin?
I will explain.
Robitussin is a legal pharmaceutical product commonly associated with coughs, colds and flu combinations.
It also features in comedian Chris Rock’s running gags about poverty, limited access to medicines and how some people got by.
In Chris’s childhood story the cure for every ailment in his neighbourhood, apart from imminent death or death itself, was Robitussin.
So it was used to fix most every ailment; asthma, cancer, a broken leg, and so on and so forth.
Whatever you had, Robitussin was the answer.
But, this wasn’t about a company like Pfizer making claims for their products.
I’m sure they would never dream of being anything other than totally ethical, decent and honest.
Chris Rock was telling a story, for comedic effect.
People who pimp big data in return for industry favours are not in the business of comedy.
They are in the business of hustling vapourware, half-truths and blatant bullshit.
Big data pimps claim it’s all new.
But it isn’t.
And they are not presenting fact.
There is nothing about Big Data that we haven’t seen before.
Big data pimps claim that methods and technologies used with Big Data are new.
But they aren’t.
Are Big Data pimps aware of the fact that they are frequently inaccurate in what they claim?
I was once told to never underestimate human kind’s ability to push the frontiers of stupidity.
But surely that can’t be the case.
I have no problem with most of the technology and methods that have been repackaged under the Big Data umbrella.
The thing is, none of them are new.
And a lot of the claims that are now made for Big Data have been roundly disproven over the last three decades and more.
But still the assertions come in thick and strong.
Like a never ending waft of the acrid stench of a nearby silage pit.
Want to do your job better? Big Data
Need insight into your organisation? Big Data
Need to know your customer? Big Data
Need to stick the beer next to the diapers? Big Data
Need to be pragmatic? Big Data
Need to process data in a parallel distributed processing environment? Big Data
Need to do search, sort and present? Big Data
Need to do computing with data? Big Data
Need to fight terrorism? Big Data
Need to cure AIDs? Big Data
Need to control Ebola? Big Data
There is an endless stream of Big Data bullshit.
And every day there is more.
I read a book on Big Data last week.
I won’t state the exact name of the title.
To protect the shameless and guilty who were responsible for its making.
But, it was something along the lines of ‘Big Data for Fuckwits’.
A complete and utter piece of perfidious, deceptive and artless shite.
Sure, that kind of thing drags the profession through the mud again.
But this nonsense is also protected under freedom of expression legislation.
So what can be done?
This blog piece was not meant to be a technical critique of the artless claims that are made in the name of Big Data.
Although there are many artless claims made to support the ‘new’ technology of Big Data.
For example, one could call people on a wide range of claims made to bolster the idea that Big Data is new.
Or even better, to correct blatant misrepresentation of Big Data, the history of IT and the evolution of database technologies.
Some Big Data pimps claim that databases evolved from the simple use of flat files and went directly to relational technology with no intervening developments.
This is bullshit.
It wilfully ignores a whole swath of database technologies, some of which are still in use in major organisations to support their core business IT applications.
Some Big Data pimps claim that Big Data processing was never done before.
This is also bullshit.
Data analytics has been carried out on Very Large Data Bases (VLDB) since the file size capacity of open operating systems grew exponentially and the price tag the hardware it ran on plummeted considerably.
Big Data pimps talk about Real-time Data Streams and Complex Even Processing as if they were new, and could only be contemplated if Big Data is an integral part of the mix.
This is worse than bullshit. It’s a damn fib.
Before Big Data ‘arrived’ on the scene, we could already do these things, and more. The only thing that was stopping some organisations from doing so was cost.
Another thing that Big Data pimps do is to regurgitate the ‘360 degree view of the business’ claim. We had this with MIS, then with Information Centres, Enterprise Data Warehousing and now Big Data.
This claim is so old and misleading that it should really be euthanized.
Don’t get me wrong, Enterprise Data Warehouses, done right, can deliver interesting benefits and valuable insight. But claiming that EDW would drive a 360 degree view of the organisation was no better than claiming that Big Data will deliver that total view. Bullshit!
But, for me, perhaps the biggest piece of boloney spouted by Big Data pimps, is this. “Big Data can spot hidden patterns in petabytes of information”.
This is AAA grade bullshit.
As anyone who knows anything about trying to identify hidden patterns in data, will know, there comes a point at which any increase in data volumes used in data analytics of a certain nature will actually diminish the probability of identifying any hidden patterns in that data.
I could go on about the nonsense claims of the Big Data hustlers. But it should be blatantly obvious what is going on.
How many more examples do people need for them to at least consider that they may be getting hustled by industry pimps?
Big data does not mean more and better insight, and it may frequently mean only one thing. That you have more valueless data to store.
People worry about leaving their domestic appliances on in standby mode, because of that little bit of energy that the standby light uses. So just imagine what is happening with vast amounts of valueless “Big data” kept on disks and in storage, 24x7x52, for years on end. How many standby lights does that all represent?
So, the only thing that the accumulation of very large quantities of valueless data is doing is this: indirectly producing more greenhouse gases and warming the planet.
The mindless application of big data dogma is not contributing to the Climate Change battle nor is it contributing insight or financial advantage.
So, beware of the pimps, hustlers and snake-oil merchants. Big Data is not a hi-tech turbo-charged Robitussin.
As always, please share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps even send me a LinkedIn invite. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.
For more on the topic, check out my other recent posts:
- Why Destructive Eagerness? The Data Warehouse Example
- Big Data and the Vs
- Did Big Data Kill the Statistician?
- Infotrends 2015: 21 Directions in Information Management
- On not knowing Climate Change
- Big Data Robitussin – Big Data: Read all about it!
- Absolute certainty…
- Mugged in Data Hell
File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro
As always: Most entertaining to read.
You may be right in Your observations, but why do You answer the vague and unsubstantiated Big Data claim with equally unsubstantiated claims?
Surely it would be an answer in itself, to take one of the more sober claims from Your list and methodically repute it. But You did not.
Also it seems You are focusing on the “big” quality in big data. Not on the volatile nature of big data nor on the accepted lack of precision or inability to reproduce exact results. These qualities tie directly into the methods applied on volumes so vast and so volatile that normal iterative data processing will be inadequate.
Now an auditor may argue that any result which cannot be repeated with same method giving same result must be deemed not trustworthy. However that is exactly the way we all make more than nine out of ten decisions. Most of information available to us in any normal situation only exists a the time we make the decision at the same time our brain processes terabytes of information and quickly comes up with a useful answer. Somehow most of us get safely from A to B every day. Why is that so?
This, I have been told, is because our brain reorders all information into known patterns (“known” not “predesigned”) at lightning speed and the acts only upon the information that do not meet the known patter. EG.: The pedestrian in the middle of the road doesn’t belong in our known chauffeurs pattern. Response: Take evasive action.
Now You could drive down that particular road ever day for the rest of Your life and chances are that the same situation would never occur again. That means that we in this situation have an excessive amount of data which is very volatile and adds up in unrepeatable patterns which we are expected to act upon. This is the essence of Big data.
Traditionally data processing defines patterns and relationships prior to loading data. Data that does not fit will be rejected and at best put on an errorlist. So unless someone in advance had contemplated the possibility of a pedestrian in the middle of the road that person would simply never be presented to the decisionmaker (a.k.a. the chauffeur).
Traditional processing of structured data is also different than processing of big data. In traditional processing You first define Your rules, then translate these into code and finally execute that code multiple times. Big data by nature does not act in that way. In Big data You first load big chunks of data into memory, then execute code that look for specific patters, look for structural patterns or look for pattern deviations. That means the answer You get, may not be the answer You was looking for. Please note: We have now left the simple task of browsing for keywords as performed through search engines.
… So in answer to Your point Martyn: YES. Big data do contain new methods and does provide better decisions. However, if we simply see BIG DATA as large volumes of data, then we have missed an important aspect og big data.
By the way… Your remark about accumulating big data is somewhat misleading as well. The nature and magnitude of big data simply undermines the concept of storing big data. Once You have processed the data You let go of it, and if You need to iterate the process You simply recollect the necessary data, and quite possibly, this will not be the same data.
An important point that You did not make (as You classified big data as large volumes of data) is: If You use big data as decision support, big data cannot exist alone. We need the structured data as well. This is why big data must be seen as supplementary to the existing data warehouse systems.
The structured data will act as the ”standards” which expresses desirable patterns. That way we get a template to push our big data through. All that fits within the template is, in principle, of no interest. It simply confirms our existing knowledge and spawns no new decisions. It is when a significant portion of info does not fit our model, that we need to make new decisions and act upon them (otherwise we will run the pedestrian over).
I hope my comments makes sense. Otherwise: Blame it on my poor English 😉
LikeLike
Hi Bjorn,
Many thanks for the comment.
It’s always a pleasure to read your views.
The piece is deliberately polemic.
I wrote it for a blog, in a blogging style.
Something more exhaustive would take me too much time.
Unfortunately.
So why do I counter unsubstantiated claims with other unsubstantiated claims?
Simply stated. As provocation.
An incitement used to start a discussion.
A dialogue about what Big Data really is about.
And substantiation is part of that.
Take for example the notion that Big Data is new because it allows for the analysis of ‘larger’ data sets.
In the late eighties I was involved in a project at Unisys that did precisely that.
We experimented with techniques for identifying hidden patterns in large data sets.
This data came in files, big flat files.
And we applied various data analytic techniques to project, reduce, search, sort and classify the data.
We built Adaptive Neural Network modules to detect patterns, and we combined this approach with traditional statistical methods and other elements of AI.
On small sets of data the patterns were clearer.
On very large datasets the patterns disappeared.
The fact that we were running the tests on massively parallel architectures was not a factor.
I was at a conference in the early nineties and asked my opposite number at IBM Research how they were getting around the fact that very large data sets resulted in poor to negligible pattern recognition.
She said they weren’t. That was where they were stuck at as well.
She said it was a bit like a public opinion poll.
You ask a controlled and scientific sample of 1,000 people of their voting intention, and the result of that poll would generally be in line with the actually voting on Election Day.
The more the sample size is increased the more volatility there is in the prediction of the outcomes.
The lesson I drew from my involvement in this line of R&D was that more data doesn’t necessarily lead to better answers or indeed to any answers.
Which pretty much echoed the complaint of the early eighties about ‘being drowned in data and starved of information’.
Over a decade later I was working on a project for a bank. It was an Enterprise Data Warehouse project, but I also had the business users of SAS in my sphere of influence and oversight.
Notionally these business SAS users were the main customers of the Enterprise Data Warehouse.
In reality this was not the case.
Because a lot of their work was ad-hoc.
So on many occasions they side-stepped the whole EDW process and ‘did their own thing’.
Doing their own thing involved creating massive sets of data, stored in files, containing both simple structured and complex structured data, which they then used SAS to analyse.
Why did they just use flat files? I was told that it delivered unparalleled performance for what they wanted to do.
Which made some sense, because back in the ANN R&D days we also preferred to use flat files, and for performance reasons.
I asked the senior SAS expert at the bank where they had picked up this approach.
They told me that they had been using this approach for a number of years, and that it was a common method amongst business-side SAS programmers/users, especially in banking and telecoms.
This was about a decade ago.
Now to the focusing on the “big” quality in big data and not on the volatile nature of big data nor on the accepted lack of precision or inability to reproduce exact results.
Big data is defined as “any kind of data source that has at least three shared characteristics”:
• Extremely large Volumes of data
• Extremely high Velocity of data
• Extremely wide Variety of data
It’s the good Big Data folk who put extremely large volumes of data at the top of the characteristics of big data.
That said, veracity was also mentioned as being the fourth characteristic.
On the comment re. Search engines.
A senior software engineer at Univac in Frankfurt developed a textual database management system and an indexing and search system for large volumes of text taken from academic thesis, technical manuals and business documents.
Many years ago I came across that application documentation in the library at Sperry, which prompted my curiosity.
This software engineer was also working a professor at the University of Gottingen. Their disciplines covered areas that came of use when designing a sophisticated textual DBMS and indexing and search engine.
Everyone wanted it, the only problem was that you needed at least $6 million USD to start.
Some of the large newspaper groups and a few government related organisations bought it.
But it had a limited audience.
Later I heard that it was reborn in spirit if not in code and is now widely available, but still very much a niche product.
Of course, the focus on the ‘intelligent’ processing of ‘unstructured’ data (something a little more sophisticated than a google search) also goes back to the initial years of the rise of Knowledge Management (and much further back). It was also rebadged in the guise of Web Harvesting, some years later.
That said. What I wanted to focus on in this blog piece is the veracity of claims made by certain Big Data advocates for big data:
# That it’s new. Because it isn’t. Not even the automated integration or mapping of data are new. Popularised? Yes. Improved? Quite possibly. New? No.
# That it can drive benefits. Because that depends on many factors.
# That its application is unlimited – just like Chris Rock’s gag about Robitussin. Because of course it isn’t. Also there is web content that links Big Data to all sorts of things. From fighting terrorism to containing Ebola.
So, to conclude with some more polemic.
At the end of the day, it’s mainly about search, and what we are constantly striving towards in the main are more sophisticated ways of implementing grep and awk.
Cheers,
Martyn
LikeLike
P.S. I also smile at the use of ‘veracity’ instead of quality in some of the best selling Big Data literature.
LikeLike
Pingback: Big Data is Dead! | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Mugged in Data Hell – Summary – На Бо́га наде́йся, а сам не плоша́й | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Consider this: Absolute certainty is an impostor | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Infotrends 2015: directions in Information Management | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: On not knowing Climate Change | The Good Climate Blog
Pingback: Consider this: Did Big Data Kill The Statistician? | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Marty does… Big Data and the Vs | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Consider this: can we avoid destructive eagerness? | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Leadership 7s: Management Talking Points #1 | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Big Data Predictions for 2017 | GOOD STRATEGY