, , , , , , , ,


Blue sky data

Hold this thought: ‘There are big lies, damn big lies and big data science’.

Statistics is a science. Some argue that it is the oldest of sciences. It can be traced back in history to the days of Augustus Caesar, and before.

In 1998, Lynn Billard, in a paper that laid out the role of the Statistician and Statistics, wrote that “no science began until man mastered the concepts and arts of counting, measuring, and weighting”.[1]

I first became aware of the role of the statistician when I was studying a combination of philosophy, politics and economics. Later, my first two managers were also enthusiastic and pedagogic members of the Royal Statistical Society (RSS), a society whose aim is “advancing the science and application of statistics, and promoting use and awareness for public benefit”.

The RSS do a good job of raising awareness about statistics and statisticians. But maybe they aren’t getting enough people’s attention.

After all, many people seem to think that statistical methods and quantitative analysis were born somewhere around 2001. Which, and sorry for raining on anyone’s parade, is not in fact the case.

To me a statistician is like a true artist.

Let me explain what I mean by that.

Picasso was perhaps the greatest painter of the 20th century.

He is down on record as saying that “It took me four years to paint like Raphael, but a lifetime to paint like a child”. But that’s not the same as a child painting, with little or no technique, skill or experience.

Picasso projected the visions of a child, through the hands of a genius. Picasso could paint like Raphael, but also as “a child”. He could paint like anyone. Many would argue that he was a true artist.

Which isn’t the same as splodging some abstract and random colourful shapes on canvas. That doesn’t automatically make someone an artist. Not in any modern formal sense. Although, that said, in the age of Postmodern Nonsense[2], anything can be anything. Which however still does not make it a fact.

Those people who watched the American television medical drama House might also make this connection.

In this series, Hugh Laurie played the part of Dr Gregory House.

In entertainment terms, Laurie convinced viewers that he was a credible physician. Only thing is, he wasn’t a physician. He was an actor pretending to be a physician, and he did a great job. He learned his lines well, and he knew how to interpret them to perfection. But as an actor, not as a doctor.

So why do we think Big Data is more than just a new name for a collection of old ideas, and why do we think that data science is forward looking and statistics is just dealing with the past? Why do we lend more credibility to rebranding than to historical fact?

More to the point why do people clamour to self-define themselves as data scientists rather than as the more recognizable, measurable and manageable role of statistician? A modern statistician who can both interpret the past and try to correctly forecast the future?

I am well aware that there has been a proclivity to hire enthusiastic amateurs or certificate-harvesters in place of trained, experienced and qualified professionals – especially if ‘the price is right’ – over trained, experienced and qualified professionals. But it is a proclivity firmly planted in the absurd, incoherent and irrational. As absurd as the dialectic notion that two-a-halfpenny qualifications are more important than knowledge and experience.

So, call me old fashioned. But when I need a haircut I will go to a hairdresser or a barber, and not to a hair artiste or a mop-follicle scientist.

When I need a person who really knows how to do a wide range of statistics, I will hire a professional and experienced Statistician.

It’s not exactly rocket surgery.

A good statistician will understand that “not everything that counts can be counted, and not everything that can be counted counts”. A quote which is variously attributed to either Albert Einstein or William Bruce Cameron.

So, getting down to fundamentals. Why would a Statistician prefer to call themselves Data Scientists, and why are some Data Scientists oblivious to or misinformed about the nature of contemporary Statistics and Statisticians?

I think the biggest problem is in the way that the IT industry relentlessly flogs new fads. It’s new lamps for old, but no matter how much obfuscation and marketing is churned into the mixture, it’s still a massive dose of flimflam and hyperbole.

The other ‘big’ problem is in how so many people are willing to jump on the flimflam trend wagon in order to wing their way into a ‘data scientist’ niche. Or rebrand themselves as data scientists as a reaction to the IT industry’s crude ‘downgrading’ of the role of statistician – quite often backed by a long concatenation of meaningless clichés, logical fallacies, inaccuracies and blatant misrepresentation.

Using the past to predict or shape the future is nothing new. So why do people pretend that it is new?

Finally, I think it’s clear where this is leading. My prediction for 2016 is that Big Data will not kill the Statistician?

My prediction for 2026 is that the ‘data scientists’ of the day will be criticising the next Big Data-like fad and especially its evangelists. Hopefully they will be able to make it clear that this is about something with a very long and rich history.

That said, I think the predicament and the ‘challenge’ we face with much of the industry hype and the unquestioning zeal of many big data and data science ‘evangelists’ can be summed up by two absolutely fabulous quotes from Ben Goldacre in Bad science: “These corporations run our culture, and they riddle it with bullshit”, and “You cannot reason people out of a position that they did not reason themselves into”.

Thanks for reading.

[1] Billard, Lynn. The Role of Statistics and the Statistician. The American Statistician, November 1998

[2] Sokal, Alan. Bricmont, Jean. Fashionable Nonsense: Postmodern Intellectuals’ Abuse of Science

As always, please share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps even send me a LinkedIn invite. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.

For more on the topic, check out my other recent posts:

File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro