Martyn Richard Jones

Bandoxa 2020

LINK: Laugh at Big Data – download my ebook for free on 17th May.

Many people come up to me in the street and ask me what big-data is all about. It has happened to me so many times in the past that I am convinced that it might just happen to you as well. I know sort of thing, I read the big-data tea leaves. Nothing gets past me.

The first time a complete stranger came up to me in public and said: “Hello, will you tell me what this big-data lark is all about then?” I was lost for words, and you just ask my Aunt Dolly, he can vouch for that, no problem. Later that day, I read a book – it was my dad’s book, with lots of pages and words – and I then decided to adopt a strategy for explaining big-data.

Therefore, in the spirit of springtime and goodwill to all men and women, I have put together this blog piece in that hope that it will enlighten, help and entertain.

My first question is this, what is big-data?

Big data can be characterised by the eleven V words – yes, eleven, not four or three, but eleven. That, in my book, is more than enough to bring up-to-speed the average big-data John or Jane that one meets on the street, and who naturally wish to be informed of such matters.

In layperson’s terms, this a series of landmarks and pointers in the analytics space used to frame and guide the didactic aspects of big-data.

The twelve fundamental V-word characteristics of the big-data canon are:

  1. Vagueness.
  2. Volume.
  3. Variety.
  4. Virility.
  5. Velocity.
  6. Vendibility.
  7. Vaticination.
  8. Voracity.
  9. Vanity.
  10. Vintage.
  11. Vulgarity.
  12. Virtuosity.

What do those characteristic tenets mean? Let’s take a look at them one by one.

Vagueness: This is perhaps the trickiest of questions to address, given the vast panorama that is cast before this incredibly complex yet easily graspable concept. But let me state this, and let there be no mistake about it. The question is as cunning as a cunning fox, and the answers, even more so. At this point, what makes big-data vague is also what makes big-data specific, explicit and certain. That is to say, to ‘come to an understanding’ of big-data, it is necessary to embrace the dialectic of knowing the unknowable completely. So belief is an essential element – belief and a lot of data, that is.

Volume: If there ever was a time to ‘pump up the volume’, we have it here with big-data.

Big, voluminous, gorgeously rotund and infinite. Big data is called big-data because there is a lovely, roly-poly, likeable never-ending load of it. Its volumes can be measured in zeta-bytes, which you can be assured, is a helluva lot of data.

Variety: As they might say down my way, “variety is the spice of life, innit?” This spice is what makes the subject of big-data so exclusive and so appealing.

Because before big-data, there was no variety in anything, at all. We lived in a bland world, bereft of detail, nuance and diversity. Nothing could be measured, analysed or explained, because we lacked big-data. We were ignorant, so ignorant and stupid that we couldn’t see the sense of putting the diapers next to the beer, of offering three for the price of two or of giving a 50% discount on the second of two identical items.

Fortunately, today this is no longer the case if we don’t want it to be, and thanks to big-data, we have a veritable sensorial explosion. No longer is IT just a couple of symbols scribbled in crayon on someone’s school notebook.

Virility: Move over smart data; the new kid on the block is big-data.

Who would have imagined? Fourteen begets in the Bible, and how many precipitations in big-data? Bazillions, I bet.

Big data creates itself, in and of itself. The more big-data you have, the more big-data gets generated. It’s like a self-fulfilling prophecy in 360 degrees, high-definition, poly-faceted and all-encompassing knowing. The sort of thing that governments would pay an arm and a leg to get their mitts on.

To paraphrase the great English stand-up philosopher and basketball coach Virginia Woolf, “It was the stupidity of big-data punditry that impressed me and how, having made those convenient bullshit lines of machine learning, the dopes speed along them unquestioning.”

I rest my case.

Velocity: Velocity is of the essence. Speed kills the competition, so adopt the mantra, more velocity, less haste.

We demand that the service is ‘velocious’, that is, quickly delicious. ‘Everything’ must be ‘now’, or it’s too late.

It means that we need to be able to handle big-data at velocity – at the speed of need.

Charles Babbage once stated (or maybe it was more than once) that “whenever the work is itself light, it becomes necessary, to economize time, to increase the velocity.”

But remember, we are dealing with mega-velocity here, so don’t drink and drive the big-data steamship, Star-ship or Mustang.

Vendibility: If you can sell it, and sell it as big-data, then it ‘is’ big-data. If you can’t, then it’s not. The saleability of big-data proves its existence.

So, what are the vendible aspects of big-data?

Let’s leave that easy question for another day. But for now, I can confidently state that it is used to mobilise armies of commentators, industry analysts, publicists, punters, writers, bloggers, gurus, futurologists, conference organisers, conference speakers, educators, customer relationship managers, salespeople, marketers and admen.

Vaticination: Edmund Burke is down on record as stating that “you can never plan the future by the past.” Now Burke may have been an intelligent person when it came to many things, but he wasn’t exactly a whiz when it came to big-data or his unstructured Python code.

There are people in the world who are in no doubt that big-data provides the visionary and predictive powers only previously thought obtainable through ritual sacrifice, magic potions and the casting of spells and runes. Others are highly critical of the idle understatement implicit in this belief.

For many, big-data will make the Oracle of Delphi look like a mere local call-centre for perturbed Athenians.

This implicit ambiguity and obscurity are why the power of vaticination plays a vital role in the world of big data.

Voracity: As George Bernard Shaw might have stated: “Man is the only animal which esteems itself rich in proportion to the quantity and veracity of its big-data.”

This is based on the quasi-rationalist argument that big-data is significant and that it has an omnipresent and insatiable self-fulfilling desire.

Big data comes with an attendant requirement for hardware, even if it is a whole load of consumer hardware tacked together in a magnificent and miraculous mesh of magic. This is good for business.

Big data can be characterised by voracity, but this comes hand in hand with the ventripotent IT industry.

Veracity: The eminence of the data being captured for big-data handling can vary significantly. The quality or lack of quality of the data naturally has the potential to impact the accuracy of analysis using that data.

Before big-data arrived on the scene, we knew nothing about data quality or data verification, and we were like simple troglodytes destined to encounter real data intelligence, whether we wanted it or not. This transitional narrative is why ETL and data cleansing tools lacked the power to effectively quality check and verify data, to ensure that any erroneous or anomalous data was rejected or flagged.

Fortunately these days, with the sophistication of tools such as grep and awk at our disposal, we have the power in our hands to ensure nothing ‘dodgy’ gets into the analytical mix.

To paraphrase Robert Louis Stevenson, “Truth in big-data and reason, not truth to the conjecture, is the true veracity.”

Vanity: In my opinion, to fully grasp the underlying and profound meaning of big-data, we need to understand the difference between pride and conceit, satisfaction and narcissism. Max Counsell claimed that “Vanity is the flatterer of the soul.” Goethe characterised vanity as being “a desire for personal glory.” After an incident with an anarchist (presumably a big-data anarchist), Blackadder remarked to Baldrick that “The criminal’s vanity always makes them make one tiny but fatal mistake. Theirs was to have their entire conspiracy printed and published in plain manuscript.”

Vintage: When it comes to data, vintage (data not wine) is the big V, the thing most people tend to find complicated and confusing. But the bottom line is that it’s all quite unpretentious. A data’s vintage lets you know the year the data items were selected. The best vintage data are the data that we have available.

To paraphrase Carl Young “Big data are born in a given event, in a given place and, like years of vintage records, they have the qualities of the year and of the season of which they are created. Big data analytics do not lay claim to anything more.” 

Vulgarity: Vulgarity is the data’s state of being vulgar.

Vulgarity is big data’s profane to data warehousing’s sacred. It is big data’s explicit to data warehousing’s nuance. It is big data’s “try something dirty” to data warehousing’s “this is what you asked for.”

Have you heard about good taste, simplicity, beauty, manners, politeness and refinement? You have? Well, forget about all of that bullshit.

Big data lacks sophistication and taste. It’s unrefined, coarse and rude. Big data can say what it wants to say and with absolute impunity. Big data can embarrass, shame and wallow in its base crudities. It is marking big-data out from the rest of the data pack. Big data is gigantic, thuggish and uncouth. It’s discourteous, rude and uncivil. It is insolent, cheeky and brazen. And it’s all yours.

This is why some priggish and puritanical data scientists have such a hard time with big-data. But they are wrong. It’s not dirty data. It is a vulgar data.

And to paraphrase that great stand-up philosopher and man-about-town E. M. Forster “It is the vice of a vulgar mind to be thrilled by big-data.”

Virtuousness: Moral and ethical principles are absolutely essential in the entire life-cycle of data, especially data about people, to ensure we are doing the right thing right. Virtuousness defines big-data in several different ways. By its absence. By its slight yet irritating presence. Or by the embarrassing reminder of its modern-day irrelevance to the usage of data in many parts of the world. Think of virtuousness as being one of big data’s anti-patterns.

The rap

So that ends the brief rundown of the twelve defining characteristics of big-data.

To summarise. That, which has passed before, necessarily divulges both the upside and downside characteristics of big-data. Hence, that is why I have reached out to you, not by opening up the kimono or pushing the plain brown envelope, but by relating the twelve V-words, and in no uncertain terms. In doing so I chose to disclose the undisclosed, exhibit the absence of essential essence, and in doing so I have opened up the entire field, discipline, profession, science and art to examination, questioning and ridicule – especially ridicule. Welsh ridicule. The worst possible kind of ridicule.

I hope, above all, that this risk-fraught disclosure of the twelve holier-than-thou characteristics of big-data will pique your interest in the amazing, fabulous and marvellous world of data and beyond.