Big Data, BS, Consider this, data analysts, data science, Data Warehouse, enterprise data warehousing, statisticians, statistics
Hold this thought: ‘There are big lies, damn big lies and big data science’.
Statistics is a science. Some argue that it is the oldest of sciences. It can be traced back in history to the days of Augustus Caesar, and before.
In 1998, Lynn Billard, in a paper that laid out the role of the Statistician and Statistics, wrote that “no science began until man mastered the concepts and arts of counting, measuring, and weighting”.
I first became aware of the role of the statistician when I was studying a combination of philosophy, politics and economics. Later, my first two managers were also enthusiastic and pedagogic members of the Royal Statistical Society (RSS), a society whose aim is “advancing the science and application of statistics, and promoting use and awareness for public benefit”.
The RSS do a good job of raising awareness about statistics and statisticians. But maybe they aren’t getting enough people’s attention.
After all, many people seem to think that statistical methods and quantitative analysis were born somewhere around 2001. Which, and sorry for raining on anyone’s parade, is not in fact the case.
To me a statistician is like a true artist.
Let me explain what I mean by that.
Picasso was perhaps the greatest painter of the 20th century.
He is down on record as saying that “It took me four years to paint like Raphael, but a lifetime to paint like a child”. But that’s not the same as a child painting, with little or no technique, skill or experience.
Picasso projected the visions of a child, through the hands of a genius. Picasso could paint like Raphael, but also as “a child”. He could paint like anyone. Many would argue that he was a true artist.
Which isn’t the same as splodging some abstract and random colourful shapes on canvas. That doesn’t automatically make someone an artist. Not in any modern formal sense. Although, that said, in the age of Postmodern Nonsense, anything can be anything. Which however still does not make it a fact.
Those people who watched the American television medical drama House might also make this connection.
In this series, Hugh Laurie played the part of Dr Gregory House.
In entertainment terms, Laurie convinced viewers that he was a credible physician. Only thing is, he wasn’t a physician. He was an actor pretending to be a physician, and he did a great job. He learned his lines well, and he knew how to interpret them to perfection. But as an actor, not as a doctor.
So why do we think Big Data is more than just a new name for a collection of old ideas, and why do we think that data science is forward looking and statistics is just dealing with the past? Why do we lend more credibility to rebranding than to historical fact?
More to the point why do people clamour to self-define themselves as data scientists rather than as the more recognizable, measurable and manageable role of statistician? A modern statistician who can both interpret the past and try to correctly forecast the future?
I am well aware that there has been a proclivity to hire enthusiastic amateurs or certificate-harvesters in place of trained, experienced and qualified professionals – especially if ‘the price is right’ – over trained, experienced and qualified professionals. But it is a proclivity firmly planted in the absurd, incoherent and irrational. As absurd as the dialectic notion that two-a-halfpenny qualifications are more important than knowledge and experience.
So, call me old fashioned. But when I need a haircut I will go to a hairdresser or a barber, and not to a hair artiste or a mop-follicle scientist.
When I need a person who really knows how to do a wide range of statistics, I will hire a professional and experienced Statistician.
It’s not exactly rocket surgery.
A good statistician will understand that “not everything that counts can be counted, and not everything that can be counted counts”. A quote which is variously attributed to either Albert Einstein or William Bruce Cameron.
So, getting down to fundamentals. Why would a Statistician prefer to call themselves Data Scientists, and why are some Data Scientists oblivious to or misinformed about the nature of contemporary Statistics and Statisticians?
I think the biggest problem is in the way that the IT industry relentlessly flogs new fads. It’s new lamps for old, but no matter how much obfuscation and marketing is churned into the mixture, it’s still a massive dose of flimflam and hyperbole.
The other ‘big’ problem is in how so many people are willing to jump on the flimflam trend wagon in order to wing their way into a ‘data scientist’ niche. Or rebrand themselves as data scientists as a reaction to the IT industry’s crude ‘downgrading’ of the role of statistician – quite often backed by a long concatenation of meaningless clichés, logical fallacies, inaccuracies and blatant misrepresentation.
Using the past to predict or shape the future is nothing new. So why do people pretend that it is new?
Finally, I think it’s clear where this is leading. My prediction for 2016 is that Big Data will not kill the Statistician?
My prediction for 2026 is that the ‘data scientists’ of the day will be criticising the next Big Data-like fad and especially its evangelists. Hopefully they will be able to make it clear that this is about something with a very long and rich history.
That said, I think the predicament and the ‘challenge’ we face with much of the industry hype and the unquestioning zeal of many big data and data science ‘evangelists’ can be summed up by two absolutely fabulous quotes from Ben Goldacre in Bad science: “These corporations run our culture, and they riddle it with bullshit”, and “You cannot reason people out of a position that they did not reason themselves into”.
Thanks for reading.
 Billard, Lynn. The Role of Statistics and the Statistician. The American Statistician, November 1998
 Sokal, Alan. Bricmont, Jean. Fashionable Nonsense: Postmodern Intellectuals’ Abuse of Science
As always, please share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps even send me a LinkedIn invite. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.
For more on the topic, check out my other recent posts:
- Why Destructive Eagerness? The Data Warehouse Example
- Big Data and the Vs
- Did Big Data Kill the Statistician?
- Infotrends 2015: 21 Directions in Information Management
- On not knowing Climate Change
- Big Data Robitussin – Big Data: Read all about it!
- Absolute certainty…
- Mugged in Data Hell
File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro
Dr. Istvan Hajnal said:
I agree with Lynn Billard that “no science began until man mastered the concepts and arts of counting, measuring, and weighting”.
Do you have empirical evidence for the statement “many people seem to think that statistical methods and quantitative analysis were born somewhere around 2001”? I don’t know anybody who thinks that way, but then again the people I know might not be a good sample.
Either way, the Data Scientists I know are either Statisticians or have a lot of experience in applied statistical research. The latter respect, if not, admire statisticians.
My paycheck used to have “statistician” on it, now it says “data scientist”, but in essence I still do same thing as I used to, except that now I work with larger data sets.
Nonetheless, I enjoyed reading your piece.
Martyn Jones said:
Hi Dr. Istvan,
Many thanks for your kind and informative comments.
I will try and respond more substantially over the next few days.
Hi Martyn, I myself am trained as a Statistician. Is the data science buzz all hype? We’re starting to see some things that distinguish it. In the November 2014 of the Statistical Society of Canada’s Liaison, they address some of these issues: http://www.ssc.ca/en/publication/liaison/ssc-liaison.
I still think the ASA are doing a horrible job at marketing themselves.
Here’s my blog post on this subject: https://atkoh.pythonanywhere.com/blog/what-people-i-know-think-statistics-is/
Martyn Jones said:
Many thanks for the comment and the links.
I will get back with a more substantive reply when I have had a chance to read the content.
Thomas Speidel said:
Very well written article. Terry Speed, an accomplished statistician put it bluntly when he said “Did any species ever avoid extinction by adopting a new name?” No, they adapted, they evolved, and so must we”. Curiously, Terry had to call himself a bioinformatician because when the wave of statistical genetic arrived, suddenly it wasn’t “cool” to be a statistician.
Statistics and Statisticians have had an image problem which we never care to address. Yet, the profession has enjoyed a steady growth in areas such as medical research, survey research, telecom, insurances etc. In medical research, where I used to be, statisticians are often part of ethical review boards. The evaluation of drugs and treatments would not be possible if it wasn’t in part for statisticians like Sir David Cox or Abraham Wald in the ’70s and 40’s, respectively. Those in quality control may cite Shewhart and Deming. And one can only wonder if Toyota would have the reputation and success it has enjoyed if it wasn’t for Deming.
The attitude we see today – that same attitude that is feeding the hype cycle- is one that aims at diminishing the role of statistics and statistician in order to elevate the “press a button” approach and “you don’t need statistics”. They do that by leveraging the bad image, the difficulty and complexities of the field, by devoiding uncertainty from statistics in order to give a false sense of certainty, by asserting the stupidity that bigger must always be better, by providing a false sense of novelty.
While it can be frustrating, I take comfort in the fact that there is no replacement for statistics, so long as the real world is uncertain.
I have come to think that the main difference between a data scientist and a statistician is his/her degree of humility. But maybe people talking about data science on the internet are self selected and do not represent the population from which they are sampled.
Some voices are raised on the internet that criticize the disproportionate importance of computer science in data science relative to statistics and knowledge expertise.
See for example here:
Pingback: Somewhere else, part 191 | Freakonometrics
Shashikant Brahmankar said:
As long as uncertainty remains, statistics stays. However, technology advancements have been pushing literally everything into real-time sense. This is primarily leading to merging of faculties of data processing, computing, predicting and representing – for real time solutioning to new issues. New aspects such as terms, nomenclatures, lingos, etc are bound to evolve for merged faculties… despite the fact that problems will not end but only likely to take newer shape. (For example.. today’s BIG Data will lead to BIG-BIG Data 10 years down the line.)
Since a few decades back theoretical statisticians have been thinking and working on super-populations. So conceptually, it is the same. Only difference is the technology working around real time solutions on computing, processing, predicting and representing… that would support real time decision making for the stakeholder (be it business, government or consumer).
Adapt! Darwin’s law of survival of the fittest is a constant to any change.
Statisticians will remain in newer “avatars”! Cheers.
By the way I am an applied statistician and working on the change. However, would want to be called a statistician only.
Pingback: Somewhere else, part 194 | Freakonometrics
Pingback: Big Data is Dead! | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Mugged in Data Hell – Summary – На Бо́га наде́йся, а сам не плоша́й | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Consider this: Absolute certainty is an impostor | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Big Data Robitussin | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Infotrends 2015: directions in Information Management | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: On not knowing Climate Change | The Good Climate Blog
Pingback: Marty does… Big Data and the Vs | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Consider this: can we avoid destructive eagerness? | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Big Data Predictions for 2015 羊 | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Leadership 7s: Management Talking Points #1 | The Good Strat Blog - Organisational Strategy and Information Management
Pingback: Big Data Predictions for 2017 | GOOD STRATEGY