Did data and AI kill the statistician?

Without a grounding in statistics, a Data Scientist is a Data Lab Assistant.
Martyn Jones
Hold this thought: There are big lies, damn big lies and data science with an AI chaser.
Statistics is a science, and some would argue that it is one of the oldest sciences.
Statistics can be traced back to the days of Augustus Caesar. He was a statesman, military leader, and the first emperor of the Roman Empire. Some set its provenance even earlier.
Indeed, suppose we accept that censuses are a part of statistics. In that case, we can trace history back to the Chinese Han Dynasty (2 AD). We can also consider the Egyptians (2,500 BC) and the Babylonians (4,000 BC).
Nonetheless, the first statistician in recorded history is Al-Kindi, a ninth-century Muslim polymath and intellectual from Kufa, a city and centre of learning on the banks of the Euphrates in the land now known as Iraq. Al-Kindi, educated in Baghdad, used frequency analysis in cryptography and codebreaking, which he wrote about in his book, “Manuscript on Deciphering Cryptographic Messages.” This book was lost to civilisation until 1987 when the treatise was rediscovered in the Süleymaniye Ottoman Archive in Istanbul.
Of course, being educated in the UK, the first statistician I became aware of as a young person was the Lady with the Lamp, the eminent Victorian best known as Florence Nightingale. Amongst other things, she pioneered the graphical representation of statistical data, something very much back in vogue these days.
In 1998, The US scientific journal The American Statistician published an article by Lynn Billard, an eminent Australian statistician and professor. It laid out the role of the statistician and statistics. She wrote that “No science began until man mastered the concepts and arts of counting, measuring, and weighting”.
I first became aware of the statistician’s role while studying philosophy, politics and economics in the late seventies.
Later, in the world of work, my first two bosses were also enthusiastic and pedagogic members of the Royal Statistical Society (RSS). Among whose founders were the polymath James Babbage, the Belgian founder of the Brussels Observatory, Adolphe Quetelet, the economist Richard Jones, and the English cleric, scholar and economist the great Thomas Malthus. Other notable RSS members include the politician Harold Wilson and the social reformer and statistician Florence Nightingale. The highly laudable aim of the RSS, founded in London in 1834, is “Advancing the science and application of statistics, and promoting use and awareness for public benefit”.
While the RSS does an excellent job of raising awareness about statistics and statisticians, I also feel that perhaps they aren’t getting people’s attention enough. After all, many folks seem to think that statistical methods and quantitative analysis were discovered around 2001. Which, and sorry for raining on anyone’s parade, is not the case.
What motivated me to write this chapter was the notions that the rise of data science would see the demise of statistics and the need for statisticians. In particular, it’s a response to disturbing claims such as “Data science is more than statistics: it also encompasses computer science and business concepts” and “A data scientist is better at statistics than any software engineer and better at software engineering than any statistician”. It is as if statisticians never engaged with business, understood computer science, or programmed computers. These comments are like puerile claims at best. I believe a well-trained statistician would have no problem quickly developing excellent programming skills. After all, programming is hardly rocket surgery.
It may not be immediately intuitive, but for me, a great statistician and a great scientist is like a true artist. Creating, practising and demonstrating their art. On face value, that may be a controversial position, so let me explain what I mean by that.
Picasso was perhaps the most celebrated painter of the 20th century. He is on record for saying, “It took me four years to paint like Raphael, but a lifetime to paint like a child.” But that’s not the same as a child painting, with little or no technique, skill or experience. Picasso took the time and trouble to learn how to paint like a child intentionally.
Picasso could recreate the visions we ascribe to a child through the hands of a genius. He could paint like Raphael, a child or anybody else. Many would argue that he was a true artist, painting as he wanted to, with purpose and intention.
The way Picasso painted isn’t the same as someone with no artistic or creative ability splodging abstract and random colours and shapes onto a canvas. That doesn’t automatically make someone an artist. Not in any modern formal sense. Although, in the age of postmodern drivel, some folks believe that everything can be anything or that everyone can be anybody. This makes sense, considering the number of Carlos Alcaraz’s, Novak Dokovic’s and Kylian Mbappé’s in the world.
But what about a statistician as a great composer of great symphonies?
My sentiments about the statistician-as-artist concept can be summed up by Lynne Billard when she said: “May the future roles of statistics and of statisticians be that beautiful (Beethoven) symphony that brings music to our ears!” The reason why I believe that statistics will continue to lead through the force of its personality and creativity. And its practitioners will expand the influence of contemporary statistics into new areas. And they will do so by taking simple and proven methods and applying them on a grand scale to address significant challenges – sometimes involving the orchestration and analysis of large data sets.
Those thoughts about art and culture, lead me to the modern domestic shrine to the entertainment industry, the gogglebox.
Those who watched the American medical TV drama House might also connect with this following sentiment. In the series, Hugh played the part of Dr Gregory House. In entertainment terms, Laurie convinces viewers that he is a credible physician. The only thing is, Laurie isn’t a physician. He is an actor pretending to be a physician, and he does a great job of pretending to be a physician. He learned his lines well, and he knew how to interpret them to perfection. But as an actor, not as a doctor.
So why do we think data is more than just a new name for a collection of old ideas? Why do we believe that data science is forward-looking, modern and sexy while simultaneously thinking that statistics are only about dealing with the past? And why indeed do we lend more credibility to rebranding, smoke and mirrors, and sexing-up than to historical fact, current evidence and critical appraisal?
More to the point, why do people clamour to self-define themselves as data scientists rather than as the more recognisable, measurable and manageable role of a statistician? A modern statistician who can interpret the past, monitor the present and try to forecast the future?
I know there has been a proclivity to hire enthusiastic amateurs and certificate harvesters instead of trained, experienced, and qualified professionals, especially if the price is right. But it is an inclination firmly planted in the absurd, incoherent and irrational. As silly as the dialectic notion that two-and-a-halfpenny qualifications are more important than knowledge and experience.
So, call me old-fashioned, but when I need a haircut, I go to a barber and not to a hair artiste or a mop-follicle scientist. When I need someone who knows how to do a wide range of statistics, I will hire a professional and experienced statistician.
A statistician understands that “not everything that counts can be counted, and not everything that can be counted counts” — a quote attributed equally to Albert Einstein and William Bruce Cameron, and would probably not be swayed by fact-free boloney. So, getting down to fundamentals, why would a statistician prefer to call themselves a data scientist, and why are some data scientists oblivious to or misinformed about the nature of contemporary statistics?
The biggest problem is how the IT industry relentlessly flogs new fads. It’s ‘new lamps for old’, but no matter how much obfuscation and marketing get churned into the mixture, it’s still recognisably a massive overload of flim-flam and hyperbole.
The other big problem is how many people will jump on the flim-flam trend wagon to wing their way into a data scientist niche. Or are they intent on rebranding themselves as data scientists as a knee-jerk reaction to the IT industry’s crude downgrading of the role of statistician – quite often backed by a long concatenation of meaningless clichés, logical fallacies, inaccuracies and blatant misrepresentation.
But “Ah,” the blaggers will say, “With data, we can now see what we couldn’t see before, and we can even predict the future, so there!” But that’s also a flawed argument. Using the past to predict and shape the future is nothing new, and neither is the identification of hidden patterns. So why do people go out of their way to pretend that this is so recent?
I think it’s fairly straightforward where this is leading. And if not, I hope it soon will be.
My forecast – better said, my guess – is that data and AI will not kill the statistician, not due to any benevolence on the part of the data science and AI communities but because it won’t be allowed to disappear so easily. There is a broad appreciation where it matters – in government, academia, the public sector and industry that statistical insight is exceedingly valuable, and statistics is a vital part of modern thinking about data.
Besides, I am sure that in 2025, or thereabouts, the data scientists of the day will be criticising the next giant data-like fad, especially its hodgepodge of evangelising carpetbaggers.
Hopefully, by 2027, the data scientists, having acquired all the necessary skills, knowledge and experience in statistics, will be able to make it clear that this is about something with a very long and rich history. Statistics is a discipline with antiquity going back to the Babylonians of central-southern Mesopotamia – modern-day Iraq. So, not exactly the new kid on the block.
The probable and unavoidable downside is that there will also be a surfeit of bullshit babblers like there is today but more so. Those with the penchant to start every new piece of populist tripe with adjectives such as excellent, fantastic and biggest. For example: ‘amazing data feeding fantastic algorithms to create the biggest claptrap.’
That said, the industry hype, arbitrary zeal and dreary nonsense expounded by data analytics, artificial intelligence, data and data science pundits can be summed up by two pertinent quotes from Ben Goldacre’s book Bad Science:
“These corporations run our culture, and they riddle it with bullshit”, and
“You cannot reason people out of a position that they did not reason themselves into”.
And, despite best efforts, data will not kill the statistician, and new age data will become mere data – just like it always was.