Big Data is all pervasive, all seeing and all knowing.
Everyone is doing Big Data, and if they aren’t then they will. It’s inevitable.
Big Data will revolutionise the worlds of data, decision making and business.
Am I right, or am I right?
Big Data is all pervasive, all seeing and all knowing.
Everyone is doing Big Data, and if they aren’t then they will. It’s inevitable.
Big Data will revolutionise the worlds of data, decision making and business.
Am I right, or am I right?
Why buy when you can get it for free?
Back at you! Here is the fifth fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you. Continue reading
Nauseated by the non-stop crap, railroading and bullying tactics from a reduced group of snotty little techno bastards? Disgusted by the crass propaganda, crude instrumentalisation and fetid boloney from the likes of Bernie, Vinnie, Spats and an attendant entourage of snake-oil merchants and brain-dead sycophants? Sick and tired of the amazing, incredible and fabulous velocities, varieties and volumes of Big Data bullshit washing the decks of the SS LinkedIn? Well, be sick and tired no longer. Here is the antidote!
Some interesting Big Data facts to think about this weekend.
I. More Big Data bullshit has been created in the last couple of years, than in the entire history of humankind.
II. Big Data bullshit will grow faster than ever before, in spite of what Gartner say to the contrary.
III. By 2021, if the mega-trending nonsense does not go unabated, there will be 40 megabytes of Big Data bullshit created for every living woman, man and child, every sixty seconds.
IV. Also, in 2021 the accumulated digital universe of Big Data bullshit will grow from 8 spartabytes to 22 marrsabytes.
V. Every second people are thinking about creating new Big Data bullshit. For example, 20 million search queries alone (per minute) are generated with the sole intent of creating even more Big Data bullshit. This is set to grow to over 100 thousand brazilian bulslhit queries per year by 2020.
VI. Every minute an estimated 280 hours of Big Data oriented porn is uploaded to the ‘next greatest thing since sliced bread and butter pudding‘ network.
VII. By 2017 over 1 trillion Big Data bullshitters will be connected via Facebook.
VIII. Facebook usage by Big Data bullshitters will make the current social media scene look like a walk in the bullring.
IX. In 2015, an astounding 1 million trolleyloads of photos were uploaded to the web every single hour of the day. By 2017, nearly 80% of photos taken will include a cameo by one or more smartass Big Data bullshit artist.
X. This year, over 4 billion smartass Big Data bullshitters will be shipped – all packed with communication devices capable of collecting and communicating all kinds of Big Data bullshit, not to mention the Big Data bullshit the amazing Big Data babblers create themselves.
XI. By 2020, we will have over 8 billion Big Data idiot savants (overtaking sentient and rational human beings).
XII. Within five years there will be over 5 billion Big Data smartasses connected in the world, all developed to collect, analyze and share Big Data bullshit.
XIII. By 2020, at least a third of all Big Data bullshit will pass through the bullshit cloud (a network of Big Data bullshit servers connected over the Big Data bullshit Internet).
XIV. Distributed Big Data bullshitting (performing Big Data bullshitting tasks using a network of computers in the cloud) is very real. Google uses it every day to involve about 10 Big Data bullshitters in answering a single search query, which takes no more that 0.2 weeks to complete.
XV. The Hadoop Bullshit Ecosystem (open bullshit software for distributed bullshitting) market is forecast to grow at a compound annual growth rate 299,258% surpassing $111 billion by 2021.
XVI. Estimates suggest that by better integrating Big Data bullshit, we could save as much as $300Bn a year on smoking, drinking and having a wild time. That’s equal to reducing costs by $1000000 a year for every person on earth.
XVII. The White House, who first recognized Big Data as the bullshit it is, has already invested more than $200 in big data bullshit projects.
XVIII. For an archetypal Fortune 1000 company, just a 10% increase in data accessibility will result in more than $650 billion additional net income.
XIX. Retailers who leverage the full power of big data could increase their operating margins by as much as 36,660%
XX. 173% of organizations have already invested or plan to invest in big data bullshit by 2099.
Many thanks for reading. Think about it. I hope you get the message.
I have written at length about the fundamental contradictions of Big Data, but what I have omitted in the past is quite possibly the biggest contradiction of all. Probably because it has more to do with how Big Data is continually hyped, rather than having anything to do with Big Data as a bag of technologies – which has a whole assortment of problems in its own right.
Last time I spoke with you about the contradictions of the Big Data it was about the three Vs of volume, variety and velocity. In general, it was a view that was well received, even if not widely understood. Which of course is close enough for government work. But, get ready for “something completely different”.
If on the one hand some folk can claim that Big Data provides fact based insights and reliable forecasts of future habits, trends and preferences, then why is it so difficult to produce and socialise – yes, I like to use that term – Big Data success stories?
In short, I think we have arrived at the stage in Big Data’s cycle where it is reasonable to ask pundits to either put up or shut up.
So, why aren’t the current Big Data success fables accompanied by facts, such as names of those involved (at least businesses), the sponsors, the suppliers, the purpose of the exercise, the desired outcomes, the data used, how it is processed, what the results were, and what tangible benefits, if any, were accrued or are accruable. If that is not enough, then let people mention the technology used, the products purchased or licensed and the methodology followed.
In short, what I would like to know is why are the evangelists of Big Data telling us that bigger data is better, that more variety leads to greater insight, and that velocity is king. Why do we we told that Big Data almost assuredly results in better decisions, by people who are coy, shy or secretive about almost facts and data coming out of Big Data projects?
I have been reminded, time and time again, that there are Big Data success stories out there, and I have even been told that this information would be fully shared with me once it was agreed with the ‘clients’ that it was okay to do so. Okay, that’s fine, I know Big Data is a roaring success story (at least in people’s minds,) and I also know that it takes some time to make things up – some people are just not creative. Sure, I was told about these ‘successes’ some time ago, and you know, I’m not expecting anything that’s worth shaking a stick at, either now or later, but I’m still waiting, boys. Notwithstanding, you will still called you out as vacuous bullshitters when the time comes.
“But” I hear you cry “there is a wealth of success stories in the presses”.
Well, no, and you would wrong and gullible and foolish to think, but that is your problem, but unfortunately also mine, because this is my profession that you are playing fast and loose with.
The fact is that there is “wealth” of content that people try and pass off as legitimate Big Data success stories, but they aren’t in fact success stories, in any way, shape or form.
The thing is, people may read the blog title and even the stand-fast, but will be less inclined to actually read the article, so what remains is the impression that there are ‘loads of Big Data success stories’. But if people actually read the articles and were intelligent enough to understand them, then they would realise that inevitably there is a massive mismatch between the title of these pieces and the content. Indeed, if these pieces were actually pieces of advertising, rather than blog comments, they would be denounced in some jurisdictions for not fulfilling the advertising criteria of legal, decent and honest.
There is one more thing that Big Data evangelists (or any self-styled pundit, guru or expert for that matter) should understand, internalise and remember. If you say that you have a Big Data success story, with all the details, and that isn’t in fact the case, and it isn’t even remotely a success story or even true, then you are simply lying, and that’s deceit, it’s unprofessional and it’s unethical, and you are a scoundrel. So live with or fix it, the choice is yours.
Many thanks for reading.
Many people come up to me in the street and ask me what Big Data is all about. It has happened to me so many times in the past that I am convinced that it might just happen to you as well. I know sort of thing, I read the Big Data tealeaves. Nothing gets past me.
The first time a complete stranger came up to me in public and said “Hello, will you tell me what this Big Data lark is all about then?” I was lost for words, you just ask my Aunt Dolly, he can vouch for that, no problem. Later that day I read a book – it was my dad’s book – and I then decided to adopt a strategy.
Therefore, in the spirit of springtime goodwill to all men and women, I have put together this blog piece in that hope that it will enlighten, help and entertain.
What is big data?
Big Data can be characterised by the 10 Vs – yes, 10, not 4. Which, in my book, is more than enough to bring up-to-speed the average Big Data John or Jane that one meets on the street, and who naturally wish to be informed of such matters.
In layperson’s terms this a series of landmarks and pointers in the analytics space used to frame and guide the didactic aspects of Big Data.
The fundamental Vs of the Big Data canon are these:
So, let me now explain what each of these characteristics mean to those who might know and for those who might want to know.
Vagueness: This is perhaps the trickiest of questions to address, given the vast panorama that is cast before this incredibly complex yet easily graspable concept. But let me state this, and let there be no mistake about it. At this point in time, what makes Big Data vague is also what makes Big Data specific, explicit and certain. That is to say, in order to ‘come to an understanding’ of Big Data, it is necessary to completely embrace the dialectic of knowing the unknowable. So belief is an absolute essential element – belief and data, that is.
Volume – If there ever was a time to “pump up the volume”, we have it here with Big Data.
Big, voluminous, gorgeously rotund and infinite. Big Data is called Big Data because there is a lovely, roly-poly, likeable never-ending load of it. Its volumes can be measured in zeta-bytes, which you can be assured, is a helluva lot of data.
Variety – As they might say down my way, “variety is the spice of life, innit”. This is what makes Big Data so special. So appealing.
Because before Big Data there was absolutely no variety in anything, at all. We lived in a bland world, bereft of detail, nuance and diversity. Nothing could be measured, analysed or explained, because we lacked Big Data. We were ignorant. So ignorant and stupid that we couldn’t see the sense of putting the diapers next to the beer, or of offering three for the price of two.
Fortunately, today this is no longer the case if we don’t want it to be, and thanks to Big Data we have a veritable sensorial explosion. No longer is IT just a couple of symbols scribbled in crayon on someone’s school notebook.
Virility – Move over Smart Data, the new kid on the block is Big Data.
If Big Data were described in the manner of a religious text, it would be accompanied by a never ending narrative of begets.
So, what does that mean?
Simply stated, Big Data creates itself, in and of itself. The more Big Data you have, the more Big Data gets created. It’s like a self-fulfilling prophecy in 360 degree, high-definition, poly-faceted and all-encompassing knowing. The sort of thing that governments would pay an arm and a leg to get their mitts on.
Velocity – Velocity is of the essence. Velocity kills the competition. More velocity, less haste.
We demand that service is ‘velocious’. ‘Everything’ must be ‘now’, or it’s too late.
This means we need to be able to handle Big Data at velocity – at the speed of need.
Charles Babbage once stated (or maybe it was more than once) that “whenever the work is itself light, it becomes necessary, in order to economize time, to increase the velocity.”
But remember, we are dealing with mega-velocity here, so don’t drink and drive the Big Data Steamship, Star-ship or Mustang.
Vendible – If you can sell it, and sell it as Big Data, then it ‘is’ Big Data. If you can’t, then it’s not. The saleability of Big Data proves its existence.
So, what are the vendible aspects of Big Data?
Let’s leave that easy question for another day. But for now I can confidently state that it is used to mobilise armies of commentators, industry analysts, publicists, punters, writers, bloggers, gurus, futurologists, conference organisers, conference speakers, educators, customer relationship managers, salespeople, marketers and admen.
Vaticination – Edmund Burke is down on record as stating that “you can never plan the future by the past”. Now Burke may have been a clever person when it came to many things, but he wasn’t exactly a whiz when it came to Big Data.
There are people in the world who are in no doubt that Big Data provides the sort of visionary and predictive powers only previously obtainable through ritual sacrifice, magic potions and the casting of spells. Others are highly critical of the understatement implicit in this belief.
For many, Big Data will make the Oracle of Delphi look like a mere call centre.
This is why the power of vaticination plays a characteristically important role in the world of Big Data.
Voracity – This is based on the quasi-rationalist argument that Big Data is big and it has an omnipresent and insatiable self-fulfilling desire.
Big Data comes with an attendant requirement for hardware, even if it is a whole load of consumer hardware tacked together in a magnificent and miraculous mesh of magic.
Big Data can be characterised by voracity, but this comes hand in hand with the ‘ventripotent’ IT industry.
Veracity – The eminence of the data being captured for Big Data handling can vary significantly. The quality or lack of quality of the data naturally has the potential to impact the accuracy of analysis using that data.
Before Big Data arrived on the scene we knew nothing about Data Quality or data verification. This is why ETL and Data Cleansing tools lacked the power to effectively quality check and verify data, to ensure that any erroneous or anomalous data was rejected or flagged.
But now, with the sophistication of tools such as ‘grep’ and ‘awk’ at our disposal, we have the power in our hands to ensure nothing ‘dodgy’ gets into the analytical mix.
Vanity – In my opinion, to fully grasp the underlying and profound meaning of Big Data, it is essential for us to understand the difference between vanity and conceit. Max Counsell claimed that “Vanity is the flatterer of the soul”. Goethe characterised vanity as being “a desire for personal glory”. After an incident with an Anarchist (presumably a Big Data Anarchist), Blackadder remarked to Baldrick that “The criminal’s vanity always makes them make one tiny but fatal mistake. Theirs was to have their entire conspiracy printed and published in plain manuscript”.
So that ends the brief rundown of the defining characteristics of Big Data.
So, to summarise. That, which has passed before, necessarily divulges both the upside and downside of Big Data. By reaching out, opening up the kimono and relating the 10 Vs we are disclosing that which cannot be disclosed, exhibiting the absence of essential essence, and thereby opening up the entire field, discipline, profession, science and art to examination, questioning and ridicule.
Many thanks for reading.
You are the boss. You are the leader, coach and manager, and there are some things that you just got to learn, like it or not. One of these skills is to be able to identify when someone has quit. “How dare they?” I here you ask.
The first time I quit a job and didn’t tell anybody was when I was in the RAF working as a fighter pilot in World War 2, and I accidentally bombed Newport in South Wales, and was given a stern talking to for my troubles. Well, I didn’t actually quit and I was never in the armed forces and I was born into the era of the Beat Generation, but that’s by the by, it’s just there for effect, to create some artificial empathy between me and those who have actually quit a job and not told anyone about it. Myself, I would never do such a thing. Although to be fair, Newport has looked like it has been freshly bombed with dark green, brown and grey shades of poster paints and self-raising flour, since forever. Continue reading
Dans ce pays-ci, il est bon de tuer de temps en temps un amiral pour encourager les autres – Voltair
My gran used to tell me that honesty pays. Of course, she never really understood banking or IT, probably because she didn’t want to know anything about them, and she never lived to witness the amazing hype circuses, the spin doctors spiel or the focus-group dog-and-pony show of the 21st century. Indeed, if honesty were a guaranteed payer my gran would have amassed more wealth than even Warren Buffet himself.
If my gran lived today, she might reflect on what Big Data might be about – maybe she would even consider it benignly, as a sort of shelter for fallen men of once uncertain virtue. We will never know. So onwards and upwards.
The Harvard Business Review contemplated honesty in somewhat different terms:
“Honesty is, in fact, primarily a moral choice. Businesspeople do tell themselves that, in the long run, they will do well by doing good. But there is little factual or logical basis for this conviction. Without values, without a basic preference for right over wrong, trust based on such self-delusion would crumble in the face of temptation.”
In a marvellous book, A few good from Univac, David E. Lundstrom narrates the story of Sperry Univac in the 1960s, one of the true great innovators in the first forty years of IT, and includes an allegory taken from the engineering front-line. I will recount it here, edited to highlight the zeitgeist, for your entertainment and as Voltaire put it, “to encourage the others”:
In the beginning was the Big Data Plan.
And then came the Big Data Assumptions.
And the Assumptions were without form.
And the Plan was without substance.
And darkness was upon the face of the Workers.
And they spoke amongst themselves, saying: “It is a crock of shit, and it stinketh.”
And the workers went unto their Supervisors and said: “It is a pail of dung, and none may abide the odor thereof.”
And the Supervisors went unto their Managers, saying: “It is a container of excrement, and it is very strong, such that none may abide by it.”
And the Managers went unto their Directors, saying: “It is a vessel of fertilizer, and none may abide its strength.”
And the Directors spoke amongst themselves, saying to one another: “It contains that which aids plant growth, and it is very powerful.”
And the Vice Presidents went unto the President, saying unto him: “This new plan will actively promote the growth and vigor of the company, with powerful effects.”
And the President looked upon the Big Data Plan, and saw that it was good.
“But?” I hear you say, “why fight it, why not take advantage of the Big Data zeitgeist?”, “Why not cash in on the grand bonanza Big Data bandwagon?” or “Monetise the 3 three famous Vs of Big Data?”
Well, it had crossed my mind, briefly, and (outside of the USA) we’ve all done stuff we have not entirely believed in, so the temptation to cash in is present, capisci? This paraphrasing of a piece from My Blue Heaven might give you a better idea:
One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex that would make the 256 trillion Shades of Blah blush and a life in the City, the Big Apple or a small town in Germany.
Moreover, for an extra 250 bucks (limited time offer) you can also become a certified Big Data Neuro Trainer, which will allow you to do unto others what has been done unto you.
I also considered Big Data Brokerage, Big Data Certification and Big Data Independent Trading (New York – Paris – Peckham). The opportunities are immense.
However, what happens when the Big Data well runs dry, and I (and many others get tarnished with the mark of Big Data) become pariah by complicity, collusion or simple association?
That question I will leave for another day. But just consider the following.
All right, I admit, I am a big long-time fan of comic genius Mel Brooks, who has a knack of capturing deep insight from the human condition, especially when the human condition is off guard and shallow. In that vein, this is how I like to think the dialogue from the Dole Office scene from The History of the World Part Two would have gone, if he were to write that today:
Dole Office Clerk: Occupation?
Data Magnus Comicus: Stand-up Big Data scientist.
Dole Office Clerk: What?
Data Magnus Comicus: Stand-up Big Data scientist. I coalesce the vaporous datas of the human interaction with the social-media networking, Internet of Everything, and always-connected experience into a… viable, analytical and meaningful predictive-comprehension.
Dole Office Clerk: Oh, a Big Data bullshit artist!
Data Magnus Comicus: *Grumble*…
Dole Office Clerk: Did you bullshit Big Data last week?
Data Magnus Comicus: No.
Dole Office Clerk: Did you try to bullshit Big Data last week?
Data Magnus Comicus: Yes!
Finally, I leave you with some wise words from Israeli American professor of psychology and behavioural economics, Dan Ariely:
“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”
Many thanks for reading.
Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.
I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.
Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.
In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.
Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.
Data Warehousing is a strategy driven, business oriented and technology based business process.
We stock Data Warehouses with data that, in one way or another, comes from internal and optional external sources, and from structured and optional unstructured data. The process of getting data from a data source to the target Data Warehouse, involves extraction, scrubbing, transformation and loading, ETL for short.
Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.
Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.
Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.
Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible – and certainly in most cases not cost effective – to determine with an operational database.
Non-Volatile: Non-volatility means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The Data Warehouse is, of course, first loaded with cleaned, integrated and transformed data that originated in the operational databases.
We build Data Warehouses iteratively, a piece or two at a time, and each iteration is primarily a result of business requirements, and not technological considerations.
Each iteration of a Data Warehouse is well bound and understood – small enough to be deliverable in a short iteration, and large enough to be significant.
Conversely, Big Data is characterised as being about:
Massive volumes: so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and
High variety: not only structured data, but also the whole range of digital data, and
High velocity: the speed at which data is generated, transmitted and received.
These are known as the three Vs of Big Data, and they are subject to significant and debilitating contradictions, even amongst the gurus of Big Data (as I have commented elsewhere: Contradictions of Big Data).
From time to time, Big Data pundits slam Data Warehousing for not being able to cope with the Big Data type hacking that they are apparently used to carrying out, but this is a mistake of those who fail to recognise a false Data Warehouse when they see one.
So let’s call these false flag Data Warehouse projects something else, such as Data Doghouses.
“Data Doghouse, meet Pig Data.”
Failed or failing Data Doghouses fail for the same reasons that Big Data projects will frequently fail. Both will almost invariably fail to deliver artefacts on time and to expectations; there will be failures to deliver value or even simply to return a break even in costs versus benefits; and of course, there will be failures to deliver any recognisable insight.
Failure happens in Data Doghousing (and quite possibly in Big Data as well) because there is a lack of coherent and cohesive arguments for embarking on such endeavours in the first place; a lack of real business drivers; and, a lack of sense and sensibility.
There is also a willing tendency to ignore the advice of people who warn against joining in the Big Data hubris. Why do some many ignore the ulterior motives of interested parties who are solely engaged in riding on the faddish Big Data bandwagon to maximise the revenue they can milk off punters? Why do we entertain pundits and charlatans who ‘big up’ Big Data whilst simultaneously cultivating an ignorance of data architecture, data management and business realities?
Some people say that the main difference between Big Data and Data Warehousing is that Big Data is technology, and Data Warehousing is architecture.
Now, whilst I totally respect the views of the father of Data Warehousing himself, I also think that he was being far too kind to the Big Data technology camp. However, of course, that is Bill’s choice.
Let me put it this way, if Oracle gave me the code for Oracle 3, I could add 256 bit support, parallel processing and give it an interface makeover, and it would be 1000 times better than any Big Data technology currently in the market (and that version of Oracle is from about 1983).
Therefore, Data Warehousing has no serious competing paragon. Data Warehousing is a real architecture, it has real process methodologies, it is tried and proven, it has success stories that are no secrets, and these stories include details of data, applications and the names of the companies and people involved, and we can point at tangible benefits realised. It’s clear, it’s simple and it’s transparent.
Just like Big Data, right?
See what I mean?
Therefore, the next time someone says to you that Big Data will replace Data Warehousing or that Data Warehousing is Big Data, or any variations on that sort of ‘stupidity’ theme, you can now tell them to take a hike, in the confidence that you are on the side of reason.
Many thanks for reading.
Aligning Big Data: http://www.linkedin.com/pulse/aligning-big-data-martyn-jones
Big Data and the Analytics Data Store: http://www.linkedin.com/pulse/big-data-analytics-store-martyn-jones
A Modern Manager’s Guide to Big Data:http://www.linkedin.com/pulse/managers-guide-big-data-context-martyn-jones
Accomodating Big Data
We’ve been told that Big Data is the greatest thing since sliced bread, and that its major characteristics are massive volumes (so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it), high variety (not only structured data, but also the whole range of digital data), and high velocity (the speed at which data is generated and transmitted). Also, from time to time, much to the chagrin of some Big Data disciples, a whole slew of new identifying Vs are produced, touted and then dismissed (check out my LinkedIn Pulse article on Big Data and the Vs).
So, beware. Things in Big Data may not be as they may seem.
I have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.
In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”
Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”
Can we call that ‘strike one’ for Big Data Vs?
It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational. It is also claimed that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is simply wrong.
If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So it does not add variety, even though it is not structured in the ways that some people might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.
“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”
Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.
Strike two! Third time lucky?
So, if we accept that Big Data isn’t really about the data volumes or data variety that leaves us with velocity, right? Well no, because if it isn’t about record breaking VLDBor significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue. The fact that some software vendors and IT service suppliers set up this ‘straw man’ argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.
Strike three, and counting.
We have been told and time again that the major difference between a data scientist and professional statistician is that the ‘scientists’ know how to cope very well with massive volumes, varieties and velocities of data. Now it turns out that this is also questionable.
According to Bob Violino writing in Information Management (Messy Big Data Overwhelms Data Scientists – 20 February 2015) “Data scientists see messy, disorganized data as a major hurdle preventing them from doing what they find most interesting in their jobs”. So, when it comes to data quality and structure the ‘scientists’ don’t really have an advantage over professional statisticians.
Last year Thomas C. Redman writing in the Harvard Business Review (Data’s Credibility Problem) noted that when Big Data is unreliable “managers quickly lose faith” and “and fall back on their intuition to make decisions, steer their companies, and implement strategy” and when this happens there is a propensity to reject potentially “important, counterintuitive implications that emerge from big data analyses.”
Data science and Big Data analytics are the new kids on the block, aren’t they?
Well, here are some real life scenarios.
A major banking equipment supplier: A lot of banking equipment is hybrid analogic-digital, a simple example of this would be a photo copier or a physical document processing device. One major supplier decided to incorporate the capture of sensor data produced by their devices to predict failure and problems. Predictive preventive maintenance rules are created and corroborated using the data generated by sensors on each customer device, and these rules then get incorporated into the devices logic.
A major IT vendor: What happens when you create an intersection and convergence between technologies, techniques and method from areas of mainstream IT, data architecture and management, statistics (quantitative and qualitative analytics) and data visualisation, artificial intelligence/machine learning and knowledge management? This is precisely what one of the main European IT vendors did, and the idea proved to be quite attractive to customers, prospects and investors.
A major integrated circuit supplier: The testing of ICs at the ‘fabs’ (manufacturing plants) generates serious amount of data. This data is used to detect errors in the IC manufacturing process, it is captured and analysed in as near real-time as possible, which is necessary due to the costly nature of over-running the production of faulty ICs. To get around this problem the company uses a combination of fast data capture, transformation and loading of data into a data analytics area to ensure early and precise problem detection.
All Big Data Analytics success stories?
The first happened in 1989, the second in 1993 and the third in 2001. Yes, Big Data and Big Data analytics are sort of newish.
What is science?
According to Vasant Dhar of the Stern School of Business (Data Science and Prediction), Jeff Leek (The key word in “Data Science” is not Data, it is Science), and repeated on Wikipedia, “In general terms, data science is the extraction of knowledge from data”. Well, excuse me if I beg to differ. I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research.
If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.
As Hazelcast VP Miko Matsumura suggested in Data Science is Dead “… put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data” and ” Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.”
DATA: “Data is a super-class of a modern representation of an arcane symbology.” – Anon
If I had a dollar for every time I heard someone claim that data has intrinsic positive value then I would be as wealthy as Warren Buffet.
If I have said it once, I have said it a hundred time. In order for data to be more than an operational necessity it requires context.
Providing valid data with valid context turns that data into information.
Data can be relevant and data can be irrelevant. That relevance or irrelevance of data may be permanent or temporary, continuous or episodic, qualitative or quantitative.
Some data is meaningless, and there are cases whereby nobody can remember why it was collected or what purpose it serves.
Taking all this into account we can ask the deadly pragmatic question: what value does this data have? Which is sometimes answered with a pertinent ‘no value whatsoever’.
It is said that Big Data is changing the world, but for all intents and purposes, and shamed by previous Big Data excesses, some people are rapidly changing the definitions and parameters of Big Data, and to position it as being more tangible and down-to-earth, whilst moving it away from its position as an overhyped and dead-ended liability.
Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. So, let’s drop the Big Data moniker, and embrace the fact that data is data, and long live ‘All Data’, yes, all digital data. Let’s consider all data and for what it’s worth to the business, and not for what some chatterers reckon its value is – having as they do, little or no insight into the businesses to which they refer, or of the data in that these businesses possess.
So, when push comes to shove, is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?
Thanks very much for reading.
Big Data is now an inhospitable and unhealthy land inhabited by those who, through accident or design, deceive naïve and sentimental bystanders and those who are willingly mislead.
When all of this Big Data malarkey started it was sort of funny, humorous and occasional witty, especially in the affected, bizarre and the frequently uninhibited ways that freshly-minted self-appointed gurus and experts would “big it up”
Doctor Freud would have had a field day with all of that, being as it was, and for that matter still is, a postmodern mishmash of Riefenstahl, Freddy Mercury and Monty Python on steroids. However, after that extended, operatic and high-camp hiatus it all went downhill.
The Big Data scene is fast becoming an outrageous and brash festival of deception, disinformation and obliviousness. Which is a pity, because it does the industry no good whatsoever.
It is telling that Big Data evangelists, gurus and assorted sycophants cannot even define Big Data adequately, never mind discuss (or for that matter, point at) tangible success stories, without falling into contradictions on all of the key defining characteristics of volume, variety and velocity, and resorting to crude debating devices to avoid or finesse the concerns and the questions.
Almost every morning I check out the industry news, and almost invariably, it comes with new mind-boggling examples of Big Data nonsense.
However, it isn’t always nonsense for nonsense’s sake, there are agendas, there are rational explanations why Big Data has become at the same time, one of the most hyped up fads in the history of IT, and one that its supporters find so difficult to actually explain and justify, in any reasonable sort of way.
Therefore, when it comes to Big Data, beyond the surfeit of platitudes, clichés, bluff and bluster, the only thing in play are the interests of industry, the patrons, the courtesans and their entourage of the innocent and the beguiled.
One of the biggest deceptions in Big Data is in the misleadingly named ‘success stories’. The thing is that most of these success stories that I have ever read have been:
However, it doesn’t stop there.
One of the clearest examples of the questionable nature of Big Data evangelism is when it is used to piggyback Big Data hype on simple, tangible and immediately recognisable artefacts or applications that have little in common with Big Data.
This is an extreme illustration, but it works like this: “iPhones are commercially successful, iPhones are part of Big Data, and therefore Big Data is commercially successful.”
As if the mere conjuring up of association, affinity and proximity will convince people of the great and growing value of Big Data.
What I am also referring to are publicity pieces that may as well have been titled:
Do you recognise similarities?
It’s no big deal, just the use of unreliable, misleading and inappropriate fallacies, dressed up as cute, plausible and accessible collateral. People may think that such things are clever and witty, but they aren’t, it’s just misleading.
Let’s continue with something simple.
Evasion is, in ethics, an act that deceives by stating a true statement that is immaterial or leads to a false deduction. For example, citing events, persons or anecdotes from the history of IT to justify the supposed or imaginary value of Big Data. This is close to the notion of a non sequitur, which of course is an argument, the conclusions from which do not follow from its premise. It falls short of being full-on sophistry, purely because the simplistic, puerile and superficial arguments put forward in favour of Big Data do not match those of the true sophist who seeks to reason with clever but fallacious and deceptive arguments. Too many of the Big Data arguments are fallacious and deceptive, but no one, equipped with a reasonable capacity for critical thinking, should take such ‘arguments’ as valid.
Hold this thought: Big Data hype is a viper’s nest of logical fallacies, white lies and disinformation.
Just when I think things could not get any weirder, they do, and Big Data ceiling of hyperbole rises even higher, up to the rarer atmosphere of extreme tendentiousness.
There is a growing mass of Big Data hoop-la, hyperbole and flim flam that exceeds all previously bounds of overstatement, solecism and confabulation. This is where the real volumes, varieties and velocities are in Big Data; in hokie.
We live, as Oscar Wilde said in his day, in and age of surfaces. Yes, superficiality, puerility and short-termism are the competing orders of the day. However, I am still amazed – and maybe wrongly so – by what ostensibly professional, experienced and knowledgeable people are willing, able and prepared to accept, especially when it comes to Big Data flim flam sauce.
Here are some examples of the nonsense about Big Data that is taken as gospel by ‘adults’:
Data Warehousing is part of Big Data: No comment.
Big Data will replace Enterprise Data Warehousing: People can’t even explain the features and benefits of Big Data. I try it make it as easy as possible, ‘if you can’t say it, point to it’. But, seriously, people can’t even relate tangible and credible Big Data success stories, never mind show how it will replace Enterprise Data Warehousing, whether that’s the Inmon or Kimball flavour, take your pick.
Everyone and every organisation can benefit from Big Data: If people can’t explain this, and they don’t in terms of tangible benefits, then the claim should remain questionable.
Data Scientists will replace Statisticians: Why is that so? It is claimed that Data Scientists are uniquely equipped to handle massive volumes, varieties and velocities of data – well, as it turns out, this isn’t certain either.
Big Data is in its infancy: I think we may be confusing infancy with lack of real traction, and of time and place utility.
You cannot be serious: Just what are people talking about here? I have read vague, naïve and ill-informed pieces about data management, data architecture, data warehousing, reporting, business intelligence and a plethora of etcetera that have been passed off as observations and commentary on Big Data. So, what makes people recycle hackneyed, misleading and badly conceptualised ‘content’?
In the commentary on one of Bernard Marr’s pieces on LinkedIn (a professional networking site) I observed that no one can adequately explain what Big Data is without falling into contradictions and fancies, and no one seems to be capable or willing to provide tangible success stories.
Bernard responded to this comment by pointing out “the reason for that is that Big Data means different things to different people.”
Fair enough. It’s an explanation.
That said, I have always had more than a tenuous dislike of postmodern thinking, in fact most things ‘postmodern’. Call me old fashioned, jaded or cynical, but to me, the idea that everything can mean anything is an aberration that I prefer to leave to others.
I am at a loss to explain why so many reasonable people are willing to embrace the hype surrounding Big Data and Big Data Analytics, including the attendant surfeit of nonsense, incongruences and contradictions, and from my perspective, it defies reason and good sense.
Therefore, I will just end again with a fabulous quote from Ben Goldacre:
“You cannot reason people out of a position that they did not reason themselves into”.
Many thanks for reading.