• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn
  • Must-Read Books from Martyn
  • MARTYN’S MUSIC
  • PODCASTS

GOOD STRATEGY

~ for every significant challenge

GOOD STRATEGY

Tag Archives: dark data

What’s all the fuss about Dark Data? Big Data’s New Best Friend

10 Tuesday Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this, dark data, Good Strat

≈ Leave a comment

Tags

All Data, Big Data, dark data, data architecture, data management, Good Strat, Martyn Jones, Martyn Richard Jones


What is Dark Data?

Dark data, what is it and why all the fuss?

First, I’ll give you the short answer. The right dark data, just like its brother right Big Data, can be monetised – honest, guv! There’s loadsa money to be made from dark data by ‘them that want to’, and as value propositions go, seriously, what could be more attractive?

Let’s take a look at the market.

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes” (IT Glossary – Gartner)

Techopedia describes dark data as being data that is “found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.” (Techopedia – Cory Jannsen)

Cory also wrote that “IDC, a research firm, stated that up to 90 percent of big data is dark data.”

In an interesting whitepaper from C2C Systems it was noted that “PST files and ZIP files account for nearly 90% of dark data by IDC Estimates.” and that dark data is “Very simply, all those bits and pieces of data floating around in your environment that aren’t fully accounted for:” (Dark Data, Dark Email – C2C Systems)

Elsewhere, Charles Fiori defined dark data as “data whose existence is either unknown to a firm, known but inaccessible, too costly to access or inaccessible because of compliance concerns.” (Shedding Light on Dark Data – Michael Shashoua)

Not quite the last insight, but in a piece published by Datameer, John Nicholson wrote that “Research firm IDC estimates that 90 percent of digital data is dark.” And went on to state that “This dark data may come in the form of machine or sensor logs” (Shine Light on Dark Data – Joe Nicholson via Datameer)

Finally, Lug Bergman of NGDATA wrote this in a sponsored piece in Wired: “It” – dark data – “is different for each organization, but it is essentially data that is not being used to get a 360 degree view of a customer.

Say what?

Okay, let’s see if we can be a bit more specific about the content of dark data?

Items on the dark data ticket include: Email; Instant messages; documents; Sharepoint content; content of collaboration databases; ZIP files; log files; archived sensor and signal data; archived web content; aged audit trails; operational database backups – full and incremental; roll-back, redo and spooled data files; sunsetted applications (code and documentation); partially developed and then abandoned applications; and, code snippets.

Most importantly, dark data is data that is not actively in use, is underutilised, or is something else. Seriously.

What can you do with it?

So, the conclusion that some have come to is this: there is a vast collection of data in various formats waiting to be monetised.

Personally, the idea that really grabs my attention is the potential ability to do novel forensic research on email. If only to find out what happened in the past.

For example, maybe it would be fascinating to see how significant challenges were identified, flagged and discussed; how strategic responses to those challenges were formulated, chosen and executed; and, how the outcomes of all of that process were reflected in email communications.

I think that this line of work can be very interesting for some people, and that interesting insights may be uncovered, but I would hate to have to put a tangible value on it, if only to avoid adding to the already galactic magnitudes of nonsense and hype surrounding certain data topics.

There are other more mundane uses of dark data.

Imagine that you are just about to embark on a Data Warehouse project (you really are a late adopter aren’t you), and you want establish a base collection of historical data. Where do you get that historical data from?

Right! Operational databases are not characteristically used to store significant amounts of historical reference data and historical transactions beyond a certain time window; there are performance and other reasons for keeping OLTP systems as lean as possible, so, initial loads of historical data is typically recreated in the Data Warehouse from backups, audit trails or logs.

Dark data and data governance

You don’t need a Chief Data Officer in order to be able to catalogue all your data assets. However, it is still good idea to have a reliable inventory of all your business data, including the euphemistically termed Big Data and dark data.

If you have such an inventory, you will know:

What you have, where it is, where it came from, what it is used in, what qualitative or quantitative value it may have, and how it relates to other data (including metadata) and the business.

What needs to be kept, and for how long, and what can be safely discarded, and when.

The risks associated with the retention or loss of that data.

If you don’t have such a catalogue and have never done a data inventory then a full data inventory and audit seems to be your new best friend.

What does it mean?

Simply stated, you may have dark data that has value, or it may be a simple collection of worthless digital nostalgia. But if you don’t know what you have, it may pay to find out what’s there, and if necessary, to let it go.

There is no point in hoarding unneeded and unwanted rubbish data. That is simply not good data management.

Finally a word on all the fuss surrounding dark data.

Failure to monetize when there is value to be obtained from dark data is one thing, claiming that value can be invariably obtained whilst actually not knowing what the data is, or how it could be monetised, is just adding to the mountain of data related ‘nonsense and hype’ doing the rounds these days. Please consider not adding to that mountain.

That’s all folks

British Rail, the national UK rail Company, used to be notorious for the number of delays and cancellations to services, and their reasons for failing to meet their obligations became stranger and stranger.

In winter, it would snow and there would be problems. And people would ask ‘how come you couldn’t deal with the snow this year, we’ve had snow for centuries?’ And back came the answers ‘Yes, Sir, but this year it was the wrong type of snow’. In autumn (the fall), it was ‘the wrong types of leaves, and ‘the wrong type of rain’, and in Summer, the ‘wrong type of sunshine’ and so on and so forth.

I hope this will not be the excuse from the Big Data and dark data pundits and punters when the much-vaunted and ‘almost’ guaranteed monetisation isn’t frequently realised.

‘Of course Big Data gives you big dollar benefits, it was just littered with the wrong type of data’ or ‘you just weren’t trying hard enough’.

Many thanks for reading.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 140 other subscribers

Top posts

  • Bulls*** at the Data Lakehouse - Revisited
  • An Open Letter to Mansoor Hussain Laghari
  • Why I called bullshit on the data lakehouse nonsense
  • In This Vile New World: A Cry for Moral Awakening
  • X Is Dying In Europe: Here's Why
  • Bullshit Your Way in AI: The Joys of Grifting the Algorithmic Age
  • Le Tricot comme Langage des Femmes en Temps de Guerre
  • UK PM May resigns amidst "departure means departure" fracas
  • 12 Data Wishes for a Smarter 2026
  • Fixing racism in Labour and in England

Latest additions

  • The Rise of Agentic AI: Silicon Valley’s Latest Obsession
  • 7 Claves del Liderazgo para el Éxito
  • A Costura da Resistencia: Mensaxes Ocultas na Guerra
  • Stricken als geheime Sprache der Frauen
  • Le Tricot comme Langage des Femmes en Temps de Guerre
  • Understanding Hasbara: Israel’s Narrative Control Tactics

Recent Comments

Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Unknown's avatarThe Case for a Globa… on REVEALING WEALTH: USING BIG DA…
Follow GOOD STRATEGY on WordPress.com

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Names in the cloud

All Data Ask Martyn awareness Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy Inform, educate and entertain. IT strategy Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

Hours & Info

UK
+33 767 120 160
Lunch: 11am - 2pm
Dinner: M-Th 5pm - 11pm, Fri-Sat:5pm - 1am

The Good Strat Archives

  • December 2025
  • November 2025
  • October 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • March 2023
  • January 2022
  • December 2021
  • November 2021
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • July 2019
  • June 2019
  • May 2019
  • December 2018
  • January 2018
  • December 2017
  • October 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014

The Stats

  • 111,461 hits

Recent posts

  • The Rise of Agentic AI: Silicon Valley’s Latest Obsession December 19, 2025
  • 7 Claves del Liderazgo para el Éxito December 18, 2025
  • A Costura da Resistencia: Mensaxes Ocultas na Guerra December 18, 2025
  • Stricken als geheime Sprache der Frauen December 18, 2025
  • Le Tricot comme Langage des Femmes en Temps de Guerre December 18, 2025
  • Understanding Hasbara: Israel’s Narrative Control Tactics December 18, 2025
  • التفاوض من أجل السلام في عالم عنيف December 18, 2025
  • 12 Influencers Tecnológicos Críticos en 2025/2026 December 18, 2025
  • बुनाई का जादू: युद्ध के दौरान संचार का माध्यम December 18, 2025
  • Сила переговоров: путь к миру December 18, 2025

Latest additions

  • The Rise of Agentic AI: Silicon Valley’s Latest Obsession
  • 7 Claves del Liderazgo para el Éxito
  • A Costura da Resistencia: Mensaxes Ocultas na Guerra
  • Stricken als geheime Sprache der Frauen
  • Le Tricot comme Langage des Femmes en Temps de Guerre
  • Understanding Hasbara: Israel’s Narrative Control Tactics

Recent Comments

Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Unknown's avatarThe Case for a Globa… on REVEALING WEALTH: USING BIG DA…

Archives

  • December 2025
  • November 2025
  • October 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • March 2023
  • January 2022
  • December 2021
  • November 2021
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • July 2019
  • June 2019
  • May 2019
  • December 2018
  • January 2018
  • December 2017
  • October 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Hours & Info

Martyn Richard Jones
Madrid, Spain
+34 692 376 698
martyn.jones@martyn.es
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com

Top Good Strat Posts & Pages

  • Bulls*** at the Data Lakehouse - Revisited
  • An Open Letter to Mansoor Hussain Laghari
  • Why I called bullshit on the data lakehouse nonsense
  • In This Vile New World: A Cry for Moral Awakening
  • X Is Dying In Europe: Here's Why
  • Bullshit Your Way in AI: The Joys of Grifting the Algorithmic Age
  • About
  • Ask Martyn
  • Le Tricot comme Langage des Femmes en Temps de Guerre
  • UK PM May resigns amidst "departure means departure" fracas

Good strat tag cloud

1 2 3 4 5 6 7 AI All Data Analytics Artificial Intelligence Behavioural Economics BI Big Data bigdata blog books Business business analysis Business Enablement business intelligence Business Management business strategy chatgpt Consider this data data integration data management data science Data Warehouse Demagogism digital-marketing Dogma Donald Trump enterprise data warehousing espanol EU fe gaza goodstart good start Good Strat goodstrat Good Strategy hamas history information Information and Technology information management Information Technology israel IT Strategy jesus knowledge leadership life llm machine learning Marketing Martyn Jones Martyn Richard Jones News Offshoring Organisational Autism palestine Philosophy poesia Politics Russia Spain statistics Strategy technology trump writing

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

  • Subscribe Subscribed
    • GOOD STRATEGY
    • Join 135 other subscribers
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy