Martyn Richard Jones
Talacharn, 28th May 2017
Folks! Here I bring you ten amazing tips from some of Big Data’s great and good. Tips that will help you become the Big Data Rock Star that you have always dreamed of being.
But first, a question.
What’s the sexiest job in the 21st century?
- Top goal scorer for Chelsea, FC Barcelona, Real Madrid or Juventus?
- Formula 1 rally driver?
- Pro circuit tennis player?
- Physical therapist?
- Airline pilot?
- Amazing Big Data Scientist?
Right, if you got it in one, read on.
1 – The Fundamental Big Data Liturgy
On your journey to the top you will need to learn how to repeat the following list, by heart, and without hesitation, perspiring or blushing:
- Big Data is for everyone.
- Big Data will eradicate world poverty.
- Big Data will bring about global peace.
- Big Data will end terrorism.
- Big Data will cure AIDS and the causes of AIDS.
- Big Data will help everyone to do a better job.
- Big Data will permanently fix non-existent climate change issues.
- Big Data will defeat all liberals, in all elections, everywhere.
- With Big Data no-one will have any need for democracy, unions or welfare.
- Big Data is my life.
- I was born to be a Data Scientist.
- I have been working with Hadoop for twenty years.
- I kept pythons when I was a kid.
- I’ve seen Monty Python and the Holy Grail… of Big Data.
2 – Correlation is King. Causation is for wimps
Strong and stable is where it’s at.
Whenever you identify amazing, interesting and fascinating correlations in data, and other people (such as Big Data deniers) accuse you of not even remotely proving causality, then you should be prepared with a great comeback. Here are some examples of withering replies for such occasions:
- I’ve analysed all the data in the world. It took 52 weeks for the analysis to run and the hardware alone cost an arm and a leg. And you are now asking me for causation?
- Causation has been used to “prove” Climate Warming, so it isn’t reliable.
- As everyone who is anyone knows, correlation is from the Italian anagram ricolorante, which means to be able to recolour, or in scientific terms, to paint the correct picture). So, it effectively renders causation as unnecessary.
- You don’t still believe in that Hume nonsense do you? It’s ancient.
- Correlation does not imply causation, so we can safely ignore the causation part.
- Jeez! I gave you the evidence, what more do you need?
- If Tom Watson had thought like that we wouldn’t have Cloud computing.
- Causation is for wimps, the paranoid and for people who don’t trust the facts.
- Are you some kind of anti-correlationist?
- Causation? What? Causation? What’s the matter with you, Bro, I thought you was my brother?
3 – Of course it’s deep learning. It’s written in Python
The great flamenco artist Camaron de la Isla was once asked if he always sang flamenco, he replied that he always sang flamenco and nothing else, and the reasons he gave was, that he was from a gypsy family in Andalucía, that he was Spanish, and therefore everything he sang would naturally be flamenco.
The same goes for programming languages and databases.
If you are using Hadoop – or any gadget from the Hadoop ecosphere, then you are doing Big Data
If you write code in Python, then obviously what you are doing is Data Science and deep learning and AI (Artificial Intelligence).
If you can add Spark, Kafka and R to your repertoire, then you’ll have it made.
4 – Hadoop will replace all databases, for everything
Hadoop is the most robust, resilient, multi-faceted and multi-purpose database to have ever stored data on the face of the earth.
- Just as assembler replaced machine code.
- Just as FORTRAN IV replaced assembler.
- Just as teletypes replaced punch cards.
- Just as magnetic tape replaced paper tape.
- Just as big disks replaced magnetic tape.
- Just as Java has replaced COBOL and PL/1 and Visual Studio.
- Just as Oracle, Redshift and SQL/Server have replaced IMS and DMS.
- Just as commodity hardware has replaced mainframes.
- Just as the web builder philosophy of agile has replaced everything formal, workable and utilitarian.
- Just as ‘make it up as you go along’ has replaced methodology.
So too will Hadoop replace Oracle, SQL/Server, DB/2 and a long list of etceteras.
So, move on up to Hadoop, seize the day or let your business die… wondering around in a daze and tripping over tins of baked beans, on the central aisle of a Tesco superstore, caught in the lights of a massive shopping trolley, driven by a revenging Big Data evangelist.
You wouldn’t want that to happen now, would you?
5 – The more data you accumulate, the more valuable it becomes
Actually, this is just good old common sense, writ large.
- You can never have enough data.
- Size, volume and velocity are king.
- Small data is not where it’s at.
- The more data you have, the better the answer.
- The more data you have, the more reliable the answer.
- The more data you have, the more defendable your answer becomes.
- The more data you have, the more convincing you will be.
- Data is valued by the 3Vs, not by its time and place utility.
- Big Data, not speed, kills the competition.
- Data rich Google is the Saudi Arabia of the Big Data age.
6 – Deconstruct your structured data
Hadoop is brilliant for unstructured data. But for best Hadoop results for your operational data, make sure you deconstruct as you load it onto the cluster.
What do we mean?
Well. We can talk about the advantages of Big Data until we are blue in the face, but, it’s not a very productive pursuit.
Put it this way. You wouldn’t try and put all your unstructured data into Microsoft Access, would you? Right, the same way you wouldn’t put structured data into Hadoop. You first have to make your structured-data unstructured.
What we are saying is that all businesses would benefit enormously from putting all of their data into flat files, on commodity Hadoop clusters, to be accessed and managed through the Hadoop ecosphere. In this way, all your data becomes amorphous and all of your amorphous data can be processed by the Hadoop ecosphere.
So, no need for other database management systems. Which leads to less complexity, less cost and greater fun.
Take our advice on this. It also works great for knocking the useless metadata stuffing out of Jurassic Park Enterprise Data Warehouses. Don’t listen to so-called Data Warehouse professionals, Highly Unstructured-Data Warehousing is the future.
Simply stated, make everything Big Data. Make everything unstructured. Make everything the same. It’s the only way to go.
As Eric Kaplan put it “It reminds me of a friend of mine who was very interested in a French philosophy called deconstruction. He advertised to me as one of deconstruction’s selling points that deconstruction deconstructs itself. I couldn’t help responding, if deconstruction deconstructs itself, why bother reading its long, boring books? Why not go for a jog instead, or reread one of Patrick O’Brian’s tremendous tales of the sea?”
7 – A picture is always better than a number
In project terms, pictures are like Agile, whilst numbers are like waterfall, PRINCE2 and the drudge of milestones, manageability and measurability.
The Big Data gurus are universally categorical on this one. Never use numbers where you can use a picture.
Numbers are tacit liabilities, whereas pictures are subjective assets
As my gran might have advised the Big Data community, had she been alive today “A picture is better than a number, and a basket of kittens is crucially superior to an interactive pie chart – no matter on whose website it is running.”
When it comes to Big Data, avoid numbers, discourage graphs and use pictures – especially interactive and animated sequences from Unbreakable Kimmy Schmidt.
8 – The best artificial intelligence doesn’t need to explain itself
In the eighties people went on and on about the need for expert systems to be able to explain the lines of reasoning they took, and the steps, rules, evaluations and data that lead to those inferences. I know, I was there… as AI and Data Base Tech R&D director for one of the USA’s greatest computer corporations. Oh, what fun we had.
But this was in the dark ages. With massive networks of interconnected mainframes, high-volume transaction processing and end user computing… real Jurassic IT Park stuff. It was the era in which people working in AI, such as me, didn’t really know what we were doing. We just sat around all day, bothering the memory of ferrite cores and chads, and pointing at the Sun with our fibre-optic-enabled digital mice, saying things like… “Round”, “hot” and “bright”. Fair play, we were dummies in short trousers. Kids playing. And, we knew it.
These days, the whizzes working in AI don’t need such namby-pamby nonsense. AI has now become a question of ‘take it or leave it, but don’t expect an explanation’. Big Data is very much carried out along the same principles, which just goes to reinforce the common sense inflection point and maturity which we have arrived at.
So, as a data scientist, if you are ever asked to provide lines of reasoning to your users, just tell them to grow up.
9 – The contradictions of Big Data make it authentic
The contradictions of Big Data, far from making it the laughing stock of the business world, show us that its fallibility of Big Data does in fact make it infallible, and therefore strong and dependable.
Yes, some people may claim that Big Data is about massive volumes of data, a large array of data types and data whose speeds of generation can reach dizzying heights. But this is not the only interpretation.
Some gurus say that Big Data is about 10, 15 or even 20 Vs.
Whilst, other gurus say that Big Data has nothing to do with size, speed or variety at all.
It’s like the gurus’ guru saying that Big Data touches everyone’s lives. But, we don’t notice it. Because it’s top-secret or in stealth mode.
Or, even consider the millions of amazing Big Data success stories that the Big Data gurus hint at. Success stories that are supremely light on tangible evidence, references or facts. Obviously because these Big Data success stories are so commercially valuable that they can never revealed – and the same goes for Big Data in the security services.
These apparent but false contradictions don’t make Big Data a mistake. There’s no need to abandon Big Data as yet another IT fuelled bubble. The sincerity of these fake contradictions actually make it real, and something worthy of being embraced by all. Big Data is the Scarlet Pimpernel of data precisely because Big Data denies being the Scarlet Pimpernel of data.
10 – The dearth of success stories is Big Data’s secret sauce
Our Big Data gurus wanted this point to be reemphasised. In part due to the deluge of criticism, negativity and incredulity we get from vociferous groups such as The Big Data Contrarians.
So, let’s put this baby to bed for once and for all.
There isn’t a dearth of Big Data success stories. In fact, there are millions of them out there.
Company X did Y with Big Data and got Z results, which improved their A process by a magnitude of B.
Company C did D with Big Data and got E results, which increased their F ratios by a factor of G% and allowed them to H and I together with J.
The security services of K together with the government of L undertook a joint project, codenamed M, to tackle N, O and P, and to bring about Q, R and S. It worked so well that now the citizens of both parties can now enjoy T, U and V.
So you see, there are plenty of Big Data success stories out there. You just have to use your imagination.
The thing is, because they have been such amazing successes… WE CAN’T TALK ABOUT ANY OF THE DETAILS!
Common sense! Duh!
11 – Big Data trumps Data Governance
We’ll keep this short and sweet.
If there is ever a conflict in your business between Big Data and Data Governance, then you must let Big Data prevail.
Some of you may be alarmed at this terse piece of advice. You may even be worrying about the consequences of accidentally putting personal identifying data into the public domain.
Our Big Data gurus are divided on this one, but, the general consensus is (going with the spirit of the Trumpton Times), that the only people who will worry about having their personal data disclosed to the public are those who have something to hide. Everyone else would be fine and dandy with the prospect. Which is after all, only common sense.
12 – The more Big Data hokum you produce and share, the bigger the Big Data Rock Star you will become
This is really self-explanatory. In classical circles it was often stated that “The sign of a good painting” such as a Rubens “with their backs towards you, is if the bottoms follow you around the room.” And so it is with Big Data. If you are in a business, and you create a work of such magnificence that its data follows your IoT around the enterprise, then you know you have produced your magnum opus (Latin for ‘top job’).
That aside, always remember this.
Becoming a Big Data Rock Star doesn’t happen overnight, it can take up to a fortnight or more. So, when you have laboriously and painstakingly built up a reputation as a Big Data Guru it is important to preserve it, nurture it and to propagate it. These days, this is made a little easier by the presence of the internet and social media, as well as computing on-the-go.
So, write your Big Data blog, speak at Big Data conferences, workshops and other events, and permanently evangelise for Big Data wherever you may be – work, sports club, gym, pub, cricket match or football stadium, etc.
And don’t worry if people laugh at you, tell you to shut up or to simply ‘eff off’…
In the end, you will reap your just deserts.
That’s it folks
It’s just left to me to thank our distinguished panel of Big Data gurus, namely:
- Charlie Chuckles
- Pikey McPieface
- Elsie ‘Tuana’ Tanner
- Rab C. Nesbitt
- Tracy O’Troy
- Lizzie Bennett
- Robin Locksley
- Betty Béarnaise Fen
- Paul Lano
- Kitty Pride
- Afilonius Rex
- Ricky Raveons
Many thanks for reading
As always, please share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps even send me a LinkedIn invite. Also, feel free to connect via Twitter and Facebook .
For more on this and other topics, check out my previous posts:
© 2017 Martyn Richard Jones