• Home
  • About
  • Opinion
  • Strategy
    • Data Warehousing
    • Ask Martyn
  • MARTYN
    • MARTYN’S MUSIC
    • Must-Read Books from Martyn
    • PODCASTS
    • MARTYN.ES

GOOD STRATEGY REBELLION

GOOD STRATEGY REBELLION

Tag Archives: statistics

The Role of Statisticians in the Age of Data Science

17 Thursday Apr 2025

Posted by Martyn Jones in Inform, educate and entertain.

≈ Leave a comment

Tags

Artificial Intelligence, Big Data, data, machine learning, science, statistics, writing

Did data and AI kill the statistician?

Without a grounding in statistics, a Data Scientist is a Data Lab Assistant.

Martyn Jones

Hold this thought: There are big lies, damn big lies and data science with an AI chaser.

Statistics is a science, and some would argue that it is one of the oldest sciences.

Statistics can be traced back to the days of Augustus Caesar. He was a statesman, military leader, and the first emperor of the Roman Empire. Some set its provenance even earlier.

Indeed, suppose we accept that censuses are a part of statistics. In that case, we can trace history back to the Chinese Han Dynasty (2 AD). We can also consider the Egyptians (2,500 BC) and the Babylonians (4,000 BC).

Continue reading →

Frequentist Inference Ate My Homework

18 Saturday Jan 2025

Posted by Martyn Jones in Inform, educate and entertain.

≈ Leave a comment

Tags

AI, Artificial Intelligence, data science, machine learning, statistics

Martyn Richard Jones, A Coruña 18th January 2025

Frequentist inference is a type of statistical inference based on frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample data by means of emphasizing the frequency or proportion of findings in the data. Frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

Continue reading →

Bayesian Statistics

13 Monday Jan 2025

Posted by Martyn Jones in Inform, educate and entertain.

≈ Leave a comment

Tags

Artificial Intelligence, bayes-theorem, bayesian, machine learning, statistics

Martyn Rhisiart Jones

Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule, after Thomas Bayes) provides a mathematical rule. It is used for inverting conditional probabilities. This enables us to find the probability of a cause given its effect.[1] For example, we know the risk of developing health problems increases with age. Bayes’ theorem allows us to assess the risk to an individual of a known age more accurately. It achieves this by conditioning the risk relative to their age. This approach is better than assuming the individual is typical of the population as a whole. Based on Bayes law, you need to consider the prevalence of a disease in a population. Also, account for the error rate of an infectious disease test. This helps evaluate the meaning of a positive test result correctly and avoid the base-rate fallacy.

One of the many applications of Bayes’ theorem is Bayesian inference, a particular approach to statistical inference, where it is used to invert the probability of observations given a model configuration (i.e., the likelihood function) to obtain the probability of the model configuration given the observations (i.e., the posterior probability)

http://en.wikipedia.org/wiki/Bayes%27_theorem

Bayesian statistics (/ˈbeɪziən/ BAY-zee-ən or /ˈbeɪʒən/ BAY-zhən)[1] is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials.[2] More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

http://en.wikipedia.org/wiki/Bayesian_statistics

Continue reading →

Analysis of variance

08 Wednesday Jan 2025

Posted by Martyn Jones in Inform, educate and entertain.

≈ Leave a comment

Tags

data science, education, machine learning, python, statistics

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the “variation” among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

http://en.wikipedia.org/wiki/Analysis_of_variance

Consider this: Analysis of variance (ANOVA) is a statistical method used to compare the means of three or more groups and determine whether there are statistically significant differences between them. It assesses whether the variability among group means exceeds what would be expected due to random variation alone.

Continue reading →

Data Warehousing means monolithic databases?

27 Friday Sep 2024

Posted by Martyn Jones in Inform, educate and entertain.

≈ 1 Comment

Tags

Analytics, cloud, comedy, data, data science, databases, enterprise data warehousing, fiction, humor, information, knowledge, Organisational Autism, short-story, statistics

Data Warehousing means monolithic databases?

Martyn Jones, Brooklyn, 27th September 2024

Narrator: Another false meme doing the rounds is that Data Warehousing necessarily means monolithic databases. This is not what data warehouses have been for many businesses.

Continue reading →

Big Data, a promised land where the Big Bucks grow

12 Thursday Feb 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Information Management, Martyn Jones

≈ 2 Comments

Tags

Analytics, Big Data, Good Strat, Martyn Jones, statistics

Consider this. Many people come up to me in the street, and, apropos of nothing, they ask me how they can make money from Big Data.

Normally I would send such people to see a specialist – no, not a guru, but a sort of health specialist, but because this has happened to me so many times now, I eventually decided to put pen to paper, push the envelope, open up the kimono, and to record my advice for posterity and the great grandchildren.

So, here are my top seven tips for cashing in quick on the new big thing on the block.

1 – A business opportunity for faith

Like every new religion, trend or fad, Big Data has its own founding myths, theology and liturgy, and there is money to be made in it; loadsa lovely jubbly money. By predicating and evangelising Big Data you will be welcomed with open arms into the Big Data faith, and will receive all the attendant benefits that will miraculously and mysteriously fall upon you and your devout friends. Go on, I dare you. Be a Big Data guru, a shepherd to a flock of sheep, and enjoy the wealth, health and happiness that most surely will come your way. You too can look cool in red Prada slippers, a flattering and flowing gown and matching accessories.

2 – Acquire it, multiply it, weigh it, mark it up and sell it on

Simply stated, this is about acquiring other people’s data, by sacred means or profane, marking it up and then selling it on. The value you add is that you act as a trusted conduit, a conduit for good. You may care to enrich the data, swop the order of data, replicate and embellish data, make stuff up, etc. which all serves to ‘add value’ to the data. You may even consider adding nuggets of value to the data, just for kicks and giggles. My best friend’s favourite is injecting the good old ‘diaper and beer’ and ‘friends and family’ clichés into every Big Data collection, as it never fails to thrill, please and delight.

3 – Anything can be anything

The good thing about making money from Big Data is that it doesn’t need to be anything to do with Big Data. Make a 20GB Enterprise Data Warehouse? Call it a Big Data success. Sell 20 boxes of dodgy doughnuts down the alternative market? Proclaim a Big Data triumph. Sell your digital porn stash to your best mate? Point to the incredible invisible hand of the Big Data market at work. See what I’m doing there. Anything can be anything, and you too can cash in on that opportunity, big time.

4 – Big Data Patronage

Tense, nervous headaches? Do you like making up stories about Big Data, or for that matter anything else? Are you a natural born fibber but are strapped for cash? Then worry no longer. If you get a Big Data patron you will be sorted for ‘life’; get two and you’ll be sorted for the afterlife as well. With a Big Data patron you can get the most tenuous, crappiest and superficial of pieces published, promoted and vaunted – globally. Can’t make it up yourself, then outsource and offshore it, after all, just get the keywords right for SEO ranking and the gullible will flock to you in droves. The down side of this profession is that you will be targeted for writing half-truths, quarter-truths and downright lies, and you will be pilloried as a purveyor of rank hyperbole. But don’t worry, take heart and never lose the faith, you will be in good company. As one Big Data guru was want to say ” If you repeat a lie often enough, people will believe it, and you will even come to believe it yourself.” Amen! brother.

5 – Big Data Certification

By 2016 there will be global demand for 30 billion Big Data professionals. Are you prepared to cash in on that inevitability? No? Then consider this.

One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex and a life in the city. Moreover, for an extra 250 bucks you can also become a certified Big Data Trainer, which will allow you to do unto others what has been done unto you.

6 – Creative Technology Reuse

Big Data has heralded in the biggest innovations known in the history of computing, and arguably in the entire history of humankind. One of those new inventions has been the now widely acclaimed and revolutionary ‘flat file data base’ (FFDB), and this has been accompanied with developments in low level operating system primitives that allow for the processing of these collections and hierarchies of FFDBs. So, if one has a mind to do so, one can get some real business leverage off of these new tendencies by borrowing 21st century technology found in old operating system hacks from the sixties and seventies and eighties and nineties and… Well, the point is that in order to get serious funding it is no longer good enough to have a half page business plan, it is also necessary to eke out ‘stuff’ that works within the new paradigms of Big Data and Big Data Analytics. For my next venture I will be looking for serious funding for my ‘Arbitrary Dawdle Down Data Street’ (AD3S) Big Data Analytics platform, a platform designed to support virtual 1k bit processing and the massively parallel provision of global regular expression search and match (S&M), concatenation and listing, and cooperative data-driven and streamed data extraction and reporting. I’m hoping to attract the attention of governments, the EU, the Manic Street Preachers, the UN, China, Vladimir Putin, the DOD, HP, Oracle, Gartner, Lana Del Rey, Deloitte and IBM. So, this is going to be absolutely massive. Word!

7 – Big Data Brokerage

According to leading management consultants and industry watchers Gartner, McKinsey and Deloitte, data needs to be managed and accounted like any other asset, such as money. To get into a similar view-point requires a massive leap of faith, but it is a conversion that might drive dividends. One avenue to be explored in eking out value from the apparently massively valuable Big Data lakes, silos and pools is through the operation of a Big Data Brokerage. A Big Data Brokerage is a business whose main responsibility is to be an intermediary that puts Big Data buyers and Big Data sellers together in order to facilitate a transaction. Big Data Brokerage companies are compensated via commission after the Big Data transaction has been successfully completed. They may also charge introductory fees. Just imagine the wealth of business opportunities in that. You could become the Goldman Sachs of data.

That’s it folks!

I hope you enjoyed this piece and would be pleased to hear your views on this and other subjects.

Whilst I understand the attraction and even the need of creating a new and significant growth industry, I would also advise a degree of restraint, and whilst I see that “Big Data” (the consideration of the potential value of All Data) has its allure, I also think that some good sense and informed caution should also prevail.

Thank you so much for reading.

Martyn Richard Jones

All Data: It’s about statistics

30 Friday Jan 2015

Posted by Martyn Jones in All Data, Consider this, DW 3.0, Good Strat, Good Strategy, Information Supply Frameowrk, Martyn Jones, Martyn Richard Jones, statistics

≈ Leave a comment

Tags

All Data, Big Data, business intelligence, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones, statistics

LinkedInHeader1

A big computer, a complex algorithm and a long time does not equal science.

Robert Gentleman

To begin at the beginning

Fueled by the new fashions on the block, principally Big Data, the Internet of Things, and to a lesser extent Cloud computing, there’s a debate quietly taking please over what statistics is and is not, and where it fits in the whole new brave world of data architecture and management. For this piece I would like to put aspects of this discussion into context, by asking what ‘Core Statistics’ means in the context of the DW 3.0 Information Supply Framework.

Core Statistics on the DW 3.0 Landscape

The following diagram illustrates the overall DW 3.0 framework:

There are three main concepts in this diagram: Data Sources; Core Data Warehousing; and, Core Statistics.

Data Sources: All current sources, varieties, velocities and volumes of data available.

Core Data Warehousing: All required content, including data, information and outcomes derived from statistical analysis.

Core Statistics: This is the body of statistical competence, and the data used by that competence. A key data component of Core Statistics is the Analytics Data Store, which is designed to support the requirements of statisticians.

The focus of this piece is on Core Statistics. It briefly looks at the aspect of demand driven data provisioning for statistical analysis and what ‘statistics’ means in the context of the DW 3.0 framework.

Demand Driven Data Provisioning

The DW 3.0 Information Supply Framework isn’t primarily about statistics it’s about data supply. However, the provision of adequate, appropriate and timely demand-driven data to statisticians for statistical analysis is very much an integral part of the DW 3.0 philosophy, framework and architecture.

Within DW 3.0 there are a number of key activities and artifacts that support the effective functioning of all associated processes. Here are some examples:

All Data Investigation: An activity centre that carries out research into potential new sources of data and analyses the effectiveness of existing sources of data and its usage. It is also responsible for identifying markets for data owned by the organization.

All Data Brokerage: An activity that focuses on all aspects of matching data demand to data supply, including negotiating supply, service levels and quality agreements with data suppliers and data users. It also deals with contractual and technical arrangements to supply data to corporate subsidiaries and external data customers.

All Data Quality: Much of the requirements for clean and useable data, regardless of data volumes, variety and velocity, have been addressed by methods, tools and techniques developed over the last four decades. Data migration, data conversion, data integration, and data warehousing have all brought about advances in the field of data quality. The All Data Quality function focuses on providing quality in all aspects of information supply, including data quality, data suitability, quality and appropriateness of data structures, and data use.

All Data Catalogue: The creation and maintenance of a catalogue of internal and external sources of data, its provenance, quality, format, etc. It is compiled based on explicit demand and implicit anticipation of demand, and is the result of an active scanning of the ‘data markets’, ‘potential new sources’ of data and existing and emerging data suppliers.

All Data Inventory: This is a subset of the All Data Catalogue. It identifies, describes and quantifies the data in terms of a full range of metadata elements, including provenance, quality, and transformation rules. It encompasses business, management and technical metadata; usage data; and, qualitative and quantitative contribution data.

Of course there are many more activities and artifacts involved in the overall DW 3.0 framework.

Yes, but is it all statistics?

Statistics, it is said, is the study of the collection, organization, analysis, interpretation and presentation of data. It deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments; learning from data, and of measuring, controlling, and communicating uncertainty; and it provides the navigation essential for controlling the course of scientific and societal advances[i]. It is also about applying statistical thinking and methods to a wide variety of scientific, social, and business endeavors in such areas as astronomy, biology, education, economics, engineering, genetics, marketing, medicine, psychology, public health, sports, among many.

Core Statistics supports micro and macro oriented statistical data, and metadata for syntactical projection (representation-orientation); semantic projection (content-orientation); and, pragmatic projection (purpose-orientation).

The Core Statistics approach provides a full range of data artifacts, logistics and controls to meet an ever growing and varied demand for data to support the statistician, including the areas of data mining and predictive analytics. Moreover, and this is going to be tough for some people to accept, the focus of Core Statistics is on professional statistical analysis of all relevant data of all varieties, volumes and velocities, and not, for example, on the fanciful and unsubstantiated data requirements of amateur ‘analysts’ and ‘scientists’ dedicated to finding causation free correlations and interesting shapes in clouds.

That’s all folks

This has been a brief look at the role of DW 3.0 in supplying data to statisticians.

One key aspect of the Core Statistics element of the DW 3.0 framework is that it renders irrelevant the hyperbolic claims that statisticians are not equipped to deal with data variety, volumes and velocity.

Even with the advent of Big Data alchemy is still alchemy, and data analysis is still about statistics.

If you have any questions about this aspect of the framework then please feel free to contact me, or to leave a comment below.

Many thanks for reading.

Catalogue under: #bigdata #technology

[i] Davidian, M. and Louis, T. A., 10.1126/science.1218685


File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro

Consider this: Big Data in Context

21 Wednesday Jan 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehouse, Data Warehousing

≈ Leave a comment

Tags

Big Data, business intelligence, Core Statistics, DW 3.0, enterprise data warehousing, information management, information supply framework, statistics

Big Data, together with Cloud computing and the Internet of Things, are topics that are very much to the fore in contemporary trends in Information Management. Continue reading →

Consider this: Big Data and the Analytics Data Store

19 Monday Jan 2015

Posted by Martyn Jones in Analytics, Big Data, Consider this, statistics

≈ Leave a comment

Tags

Analytics, Big Data, Data Marts, enterprise data warehousing, statistics

To begin at the beginning

Hold this thought: If Data Warehousing was Tesco then Big Data would be the “try something different”.

Since the publication of the article Aligning Big Data, which basically laid out a draft view of DW 3.0 Information Supply Framework and placed Big Data within a larger framework, I have been asked on a number of occasions recently to go into a little more detail with regards to the Analytics Data Store (ADS) component. This is an initial response to those requests. Continue reading →

CONSIDER THIS: Did Big Data Kill The Statistician? 2026

03 Wednesday Dec 2014

Posted by Martyn Jones in consider, Consider this, data science, Polemic, statistics

≈ 20 Comments

Tags

Big Data, BS, Consider this, data analysts, data science, Data Warehouse, enterprise data warehousing, statisticians, statistics

OLYMPUS DIGITAL CAMERA
Blue sky data

Martyn Rhisiart Jones

Hold this thought: ‘There are big lies, damn big lies and big data science’.

Statistics is a science. Some argue that it is the oldest of sciences. It can be traced back in history to the days of Augustus Caesar, and before.

In 1998, Lynn Billard wrote a paper that laid out the role of the Statistician and Statistics. She stated that “no science began until man mastered the concepts and arts of counting, measuring, and weighting”.[1]

Continue reading →

Follow Good Strategy

Enter your email address to follow this strategy platform and receive free notifications of new content by email.

Top posts

  • CONSIDER THIS: Top Countries Known for Arrogance and Ignorance
  • Mobile Device Revolution: Five Trends for 2026
  • UNIVAC: Predicting Elections and Defining Computing History - 2026/02/06
  • IT'S POLITICS: The Madness of Brexit
  • LE COIN DES ARNAQUEURS : L’impact de l’IA avec des agents dans Gammon
  • IT'S POLITICS: Labour's Brexit Strategy for Idiots
  • Deep learning; shallow understanding.
  • DATA WORLD: Why I called bullshit on the data lakehouse nonsense
  • CONSIDER THIS: Did Big Data Kill The Statistician? 2026
  • IT'S POLITICS: Understanding Hasbara: Israel's Narrative Control Tactics

Recent Comments

miniPcs's avatarminiPcs on Datos de Forma Deliberada: Mej…
#writing on HOW NOT TO: Embrace the AI Swa…
#tech on HOW NOT TO: Embrace the AI Swa…
Literbook on GRIFTER’S CORNER: Is AI…
#writing on GRIFTER’S CORNER: Is AI…

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Recent articles

  • En este mundo nuevo y vil: Un clamor por el despertar moral – Domingo 7 de junio de 2026
  • In This Vile New World: A Cry for Moral Awakening – Sunday 2026/06/07
  • 10 Ways Humanoid Robots Will Revolutionise the Art of Grifting You Out of Your Job
  • Twelve Women Discuss Their Ultimate Female Heroes
  • Twelve Women Discuss Their Ultimate Female Heroes
  • The Architecture of Complicity: Arendt, Machiavelli, and Fanon on the Mechanics of State Violence
  • THE GS PODCAST: Is Martyn Rhisiart Jones Antisemitic?
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.

To find out more, including how to control cookies, see here: Cookie Policy

Hours & Info

Spain
+34 692 376 698
martyn.jones@martyn.es
Lunch: 13:30pm - 14:30pm
Dinner: M-Th 20:00pm - 21:00pm, Fri-Sat:21:00pm - 22:00pm

The Stats

  • 125,317 hits

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Log in
  • En este mundo nuevo y vil: Un clamor por el despertar moral – Domingo 7 de junio de 2026

Powered by WordPress.com.

Loading Comments...