• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn
  • MARTYN
    • MARTYN’S MUSIC
    • Must-Read Books from Martyn
    • PODCASTS
    • MARTYN.ES

GOOD STRATEGY

~ DATA, INFORMATION & KNOWLEDGE

GOOD STRATEGY

Tag Archives: Martyn Jones

Absolutely Fabulous Big Data Roles

03 Mon Aug 2015

Posted by Martyn Jones in Big Data, Consider this, good start, goodstart, Martyn Jones, Strategy

≈ 1 Comment

Tags

Big Data, Consider this, goodstart, Martyn Jones, Strategy


Plus ça change, plus c’est la même chose.

Jean-Baptiste Alphonse Karr

Prologue

I wrote a piece called ‘7 New Big Data Roles for 2015’. I published it on LinkedIn. Many people read it. Some people made suggestions. Others politely ignored it.

I listened to the suggestions, comment and criticisms, and revised the piece as a result.

So here, it is… I hope you like it. And if not, I might try again in six months’ time.

Continue reading →

The Hadoop Honeymoon is Over

16 Sat May 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Richard Jones, Strategy

≈ 5 Comments

Tags

Big Data, hadoop, Martyn Jones, Strategy


Listen up Big Data playmates! The ubiquitous Big Data gurus, tied up in their regular chores of astroturfing mega-volumes, velocities and varieties of superficial flim flam, may not have noticed this, but, Hadoop is getting set up for one mighty fall – or a fast-tracked and vertiginous black run descent. Why do I say that? Well, let’s check the market. Continue reading →

On Not Knowing Sentiment Analysis

12 Tue May 2015

Posted by Martyn Jones in Big Data, Big Data Analytics, Consider this, good start, goodstart, sentiment analysis

≈ Leave a comment

Tags

All Data, Analytics, aspiring tendencies in IM, awareness, good start, Good Strat, goodstart, Martyn Jones, Strategy


If you know all about Sentiment Analysis, you’ve come to the right place. Because I don’t have a clue if what I know about it is accurate or not.

I started to do a bit research into this Sentiment Analysis lark, in particular with the theoretical idea of using it to analyse and draw conclusions from comments on Pulse – assuming that this is what it can be used for.

To begin at the beginning, which is good place to start, I read the piece on Wikipedia, and this was how it began:

“Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).” Source: Wikipedia Link:http://en.wikipedia.org/wiki/Sentiment_analysis

Well, that’s a fairly intuitive description. I could have almost have guessed as much.

But, back to the aim of analysing sentiment in Pulse comments, where to start and what to do.

What would sentiment analysis make of these:

On the death of an IT-business celebrity. What would sentiment analysis make of the very emotive comments of desolation, sadness and poignancy of people who didn’t personally know the departed, even remotely, or maybe didn’t even know of them until after they had ‘shuffled off life’s mortal coil’? How would that work? What would sentiment analysis make of the maudlin aphorisms, surrogate grief and bizarre sorrow of people separated by more degrees than Kofi Anan and Mork from Ork.  What additional insight does sentiment analysis tell us when these comments are analysed along with the body of the text and other comments that triggers these comments?

In a similar vein, how does sentiment analysis catch instances of sycophancy? Especially considering the fact that some of it is so ‘in your face’ and blatant that it often times seems to be a bad parody of a bad parody. “Oh, Ricky, why are you such a sexy brainbox?” How does it work in those situations?

Worse than that is the preening, gushing and obtuse texts of massive, errm… fabulators[i]. If it wasn’t about Big Data or Strategy or IT, it would be about something else, usually about the writer themselves. “I give Rafa and Rodge tips on tennis! I went to the University of the Universe and got a first! I challenged Superman to a race, and won! I have read the entire works of Dan Brown, 25 times…Neeeh!” What would sentiment analysis do with that sort of gold?

Also, what does sentiment analysis do with texts so ambiguously daft that they could mean anything? Okay, it might be able to pick up a few trigger words here or there, “rubbish”, “of”, “load”, “a”, “what”, etc. However, how does it know when “excellent” is being used in a way that means anything but excellent? For example, “Excellent Big Data job there”, with the silent “if you want a job doing properly then do it yourself”.

Finally, for the purpose of this little piece, what would sentiment analysis do with term abuse, if it could actually identify it? Going back to the use of the terms such as Big Data or Strategy, how can sentiment analysis discern between the dopey and wrong-headed use of the term, and when it is actually being used in a coherent, cohesive and consistent way, in line more or less with its formal definition? I suppose we can always write a mountain of rules to help us out:

If topic in focus of piece is strategy

And context of topic is business

And author of piece is Richard Rumelt

Then the credibility of text is good (with a certainty of 100%)

But you and try and maintain a rule base with isntances like that. It soon becomes a management nightmare.

Alternatively, maybe it could be used to analyse this text. It’ll have its work cut out, that’s for sure. Does sentiment analysis do sarcasm and cynicsm?

Anyway! I bet you might know how this sentiment analysis works, don’t you? On the other hand, if not, then it will be someone else who ‘knows’. But of course, all will not be revealed, because it’s a secret so powerful, that in the wrong hands it could be used to dominate the entire galaxy.

Only joking; and many thanks for reading.

[i]To engage in the composition of fables or stories, especially those featuring a strong element of fantasy: “a land which … had given itself up to dreaming, to fabulating, to tale-telling” (Lawrence Durrell).

lang: en_US

Sexing up Big Data’s Dodgy Dossier

20 Fri Mar 2015

Posted by Martyn Jones in Big Data, Consider this, good start, Good Strat, goodstart, Martyn Jones

≈ Leave a comment

Tags

Big Data, good start, goodstart, Martyn Jones, Martyn Richard Jones


Most of us would probably like to work in a profession recognised for its legality, decency and honesty. At least I hope so. In my line of work, what we have right now is palpable evidence that the IT industry lacks a moral compass.

Imagine this. A major sensationalist tabloid pulls together a team of diverse journalists who are set to work on a national campaign to promote very high usage of sunbeds as a cure for cancer. Why? The newspaper owner’s son owns the sunbed franchise.

The health experts criticise the publisher for being irresponsible, unprofessional and lacking in scruples.

The public is mainly undecided, but many take the story on face value and adopt the fad. The intensive use of sunbeds sharply increases. Elsewhere, in unrelated news, the cases of skin cancer show a marked increase. Some blame it on EU legislation for bangers and bananas.

In spite of protests, the press campaign continues over many months.

Eventually, and based on the evidence of recognised health experts and bodies, the press regulatory association tries to get the offending publisher to temper their claims, but without any success. It is only when the government’s lawyers step in and threaten the newspaper owners with legal proceedings, do they freeze their campaign. Much later, the editor resigns and the board of directors issue a short apology on the back pages of their much vaunted organ.

We have that in IT. Our current sunbed cure for cancer, if you believe those who are ‘bigging it up’, is undoubtedly Big Data.

I occasionally post content to Linkedin, some of it (maybe even this piece) gets promoted through the Pulse Big Data channel. There are some reasonable pieces pinned to that channel, but unfortunately, for much of the time what we get is total and moronic Big Data astroturfing. Tantamount to the equivalent of Big Data’s very own Big Lie campaign.

The Linkedin Big Data channel reflects life, and it is full of self-aggrandising and shameless marketing guff, shot-through with scandalously flimsy promotions of tendentious success stories, specious claims of value, half-truths about realisable benefits and embarrassing conjecture about the importance of social media and internet logs.

What I am referring to mainly are superficially neutral (yet virally toxic) pieces placed in the public domain in order to promote Big Data at any cost.

Now let’s step back a bit.

For over 125 years, the Financial Times (FT) has built up a solid professional reputation for accurate reporting, reliable journalism and informative editorials. The FT is a newspaper trusted by its discerning readership and admired everywhere. In fact, I could not imagine their journalists writing about markets, securities and financial houses the same way that pundits elsewhere write about Big Data, Dark Data and the Internet of Things. Because the FT knows, that maintaining the trust of their readership is far more important than winning the short-term favours of a few market players.

So consider this; if we in IT cannot bring our standards of communicating with the public up to the levels of the financial industry, at minimum, you know what that means don’t you?

Exactly. The IT industry will have a far worse public image problem than the bankers and the solicitors currently have, and we all understand the general public appreciation of those professions.

Now, call me old fashioned, but for me that possibility is worthy of serious consideration, and especially by those in IT who confuse no holds barred pimping of fads, trends and technology, in which truth, decency and honesty are optional, for ethical, candid and informative analysis and reporting of the industry.

How will the industry take these criticisms?

To go back to the sunbed analogy what we will most certainly get comments in this vein:

Whilst those who rail against ‘the cancer curing advantages of sunbed use’ may be right – or at least partially right – the sunbed revolution will continue, just as the IT revolution industry has done, and in spite of people saying that the age of computing would be a passing mania.

So, when someone tells you “intensive sunbed use is just a dangerous fad”, what they actually mean to say is that we don’t need the term any more, as intensive sunbed use is here to stay, as are those who are shrewd, unprincipled and cynical enough to cash-in on the public’s gullibility and wilful stupidity when it comes to fads.

Yes, it does get that bad.

We have people who seemingly spend all their waking lives working out not-so-original ways and means of riddling the IT industry with vacuous bullshit, and what Big Data promotion has shown us clearly is that what we have palpable and comprehensive evidence that the IT industry in general lacks a moral compass.

Is that a reflection of IT, of those who create and manipulate IT fads, or of society in general?

Many thanks for reading.

As always, please share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps even send me aLinkedIn invite. Also feel free to connect via Twitter, Facebook and the Cambriano Energy website.

The Biggest Contradiction of Big Data

20 Fri Mar 2015

Posted by Martyn Jones in Big Data, Consider this, good start, goodstart

≈ Leave a comment

Tags

Big Data, contradictions, data management, good start, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones


I have written at length about the fundamental contradictions of Big Data, but what I have omitted in the past is quite possibly the biggest contradiction of all. Probably because it has more to do with how Big Data is continually hyped, rather than having anything to do with Big Data as a bag of technologies – which has a whole assortment of problems in its own right.

Last time I spoke with you about the contradictions of the Big Data it was about the three Vs of volume, variety and velocity. In general, it was a view that was well received, even if not widely understood. Which of course is close enough for government work. But, get ready for “something completely different”.

If on the one hand some folk can claim that Big Data provides fact based insights and reliable forecasts of future habits, trends and preferences, then why is it so difficult to produce and socialise – yes, I like to use that term – Big Data success stories?

In short, I think we have arrived at the stage in Big Data’s cycle where it is reasonable to ask pundits to either put up or shut up.

So, why aren’t the current Big Data success fables accompanied by facts, such as names of those involved (at least businesses), the sponsors, the suppliers, the purpose of the exercise, the desired outcomes, the data used, how it is processed, what the results were, and what tangible benefits, if any, were accrued or are accruable. If that is not enough, then let people mention the technology used, the products purchased or licensed and the methodology followed.

In short, what I would like to know is why are the evangelists of Big Data telling us that bigger data is better, that more variety leads to greater insight, and that velocity is king. Why do we we told that Big Data almost assuredly results in better decisions, by people who are coy, shy or secretive about almost facts and data coming out of Big Data projects?

I have been reminded, time and time again, that there are Big Data success stories out there, and I have even been told that this information would be fully shared with me once it was agreed with the ‘clients’ that it was okay to do so. Okay, that’s fine, I know Big Data is a roaring success story (at least in people’s minds,) and I also know that it takes some time to make things up – some people are just not creative. Sure, I was told about these ‘successes’ some time ago, and you know, I’m not expecting anything that’s worth shaking a stick at, either now or later, but I’m still waiting, boys. Notwithstanding, you will still called you out as vacuous bullshitters when the time comes.

“But” I hear you cry “there is a wealth of success stories in the presses”.

Well, no, and you would wrong and gullible and foolish to think, but that is your problem, but unfortunately also mine, because this is my profession that you are playing fast and loose with.

The fact is that there is “wealth” of content that people try and pass off as legitimate Big Data success stories, but they aren’t in fact success stories, in any way, shape or form.

The thing is, people may read the blog title and even the stand-fast, but will be less inclined to actually read the article, so what remains is the impression that there are ‘loads of Big Data success stories’. But if people actually read the articles and were intelligent enough to understand them, then they would realise that inevitably there is a massive mismatch between the title of these pieces and the content. Indeed, if these pieces were actually pieces of advertising, rather than blog comments, they would be denounced in some jurisdictions for not fulfilling the advertising criteria of legal, decent and honest.

There is one more thing that Big Data evangelists (or any self-styled pundit, guru or expert for that matter) should understand, internalise and remember. If you say that you have a Big Data success story, with all the details, and that isn’t in fact the case, and it isn’t even remotely a success story or even true, then you are simply lying, and that’s deceit, it’s unprofessional and it’s unethical, and you are a scoundrel. So live with or fix it, the choice is yours.

Many thanks for reading.

Big Data’s Virtuous Circus

20 Fri Mar 2015

Posted by Martyn Jones in Big Data, Consider this, data management, good start, goodstart

≈ Leave a comment

Tags

Big Data, data architecture, data management, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones


Many people come up to me in the street and ask me what Big Data is all about. It has happened to me so many times in the past that I am convinced that it might just happen to you as well. I know sort of thing, I read the Big Data tealeaves. Nothing gets past me.

The first time a complete stranger came up to me in public and said “Hello, will you tell me what this Big Data lark is all about then?” I was lost for words, you just ask my Aunt Dolly, he can vouch for that, no problem. Later that day I read a book – it was my dad’s book – and I then decided to adopt a strategy.

Therefore, in the spirit of springtime goodwill to all men and women, I have put together this blog piece in that hope that it will enlighten, help and entertain.

What is big data?

Big Data can be characterised by the 10 Vs – yes, 10, not 4. Which, in my book, is more than enough to bring up-to-speed the average Big Data John or Jane that one meets on the street, and who naturally wish to be informed of such matters.

In layperson’s terms this a series of landmarks and pointers in the analytics space used to frame and guide the didactic aspects of Big Data.

The fundamental Vs of the Big Data canon are these:

  • Vagueness
  • Volume
  • Variety
  • Virility
  • Velocity
  • Vendible
  • Vaticination
  • Voracity
  • Vanity

So, let me now explain what each of these characteristics mean to those who might know and for those who might want to know.

Vagueness: This is perhaps the trickiest of questions to address, given the vast panorama that is cast before this incredibly complex yet easily graspable concept. But let me state this, and let there be no mistake about it. At this point in time, what makes Big Data vague is also what makes Big Data specific, explicit and certain. That is to say, in order to ‘come to an understanding’ of Big Data, it is necessary to completely embrace the dialectic of knowing the unknowable. So belief is an absolute essential element – belief and data, that is.

Volume – If there ever was a time to “pump up the volume”, we have it here with Big Data.

Big, voluminous, gorgeously rotund and infinite. Big Data is called Big Data because there is a lovely, roly-poly, likeable never-ending load of it. Its volumes can be measured in zeta-bytes, which you can be assured, is a helluva lot of data.

Variety – As they might say down my way, “variety is the spice of life, innit”. This is what makes Big Data so special. So appealing.

Because before Big Data there was absolutely no variety in anything, at all. We lived in a bland world, bereft of detail, nuance and diversity. Nothing could be measured, analysed or explained, because we lacked Big Data. We were ignorant. So ignorant and stupid that we couldn’t see the sense of putting the diapers next to the beer, or of offering three for the price of two.

Fortunately, today this is no longer the case if we don’t want it to be, and thanks to Big Data we have a veritable sensorial explosion. No longer is IT just a couple of symbols scribbled in crayon on someone’s school notebook.

Virility – Move over Smart Data, the new kid on the block is Big Data.

If Big Data were described in the manner of a religious text, it would be accompanied by a never ending narrative of begets.

So, what does that mean?

Simply stated, Big Data creates itself, in and of itself. The more Big Data you have, the more Big Data gets created. It’s like a self-fulfilling prophecy in 360 degree, high-definition, poly-faceted and all-encompassing knowing. The sort of thing that governments would pay an arm and a leg to get their mitts on.

Velocity – Velocity is of the essence. Velocity kills the competition. More velocity, less haste.

We demand that service is ‘velocious’. ‘Everything’ must be ‘now’, or it’s too late.

This means we need to be able to handle Big Data at velocity – at the speed of need.

Charles Babbage once stated (or maybe it was more than once) that “whenever the work is itself light, it becomes necessary, in order to economize time, to increase the velocity.”

But remember, we are dealing with mega-velocity here, so don’t drink and drive the Big Data Steamship, Star-ship or Mustang.

Vendible – If you can sell it, and sell it as Big Data, then it ‘is’ Big Data. If you can’t, then it’s not. The saleability of Big Data proves its existence.

So, what are the vendible aspects of Big Data?

Let’s leave that easy question for another day. But for now I can confidently state that it is used to mobilise armies of commentators, industry analysts, publicists, punters, writers, bloggers, gurus, futurologists, conference organisers, conference speakers, educators, customer relationship managers, salespeople, marketers and admen.

Vaticination – Edmund Burke is down on record as stating that “you can never plan the future by the past”. Now Burke may have been a clever person when it came to many things, but he wasn’t exactly a whiz when it came to Big Data.

There are people in the world who are in no doubt that Big Data provides the sort of visionary and predictive powers only previously obtainable through ritual sacrifice, magic potions and the casting of spells. Others are highly critical of the understatement implicit in this belief.

For many, Big Data will make the Oracle of Delphi look like a mere call centre.

This is why the power of vaticination plays a characteristically important role in the world of Big Data.

Voracity – This is based on the quasi-rationalist argument that Big Data is big and it has an omnipresent and insatiable self-fulfilling desire.

Big Data comes with an attendant requirement for hardware, even if it is a whole load of consumer hardware tacked together in a magnificent and miraculous mesh of magic.

Big Data can be characterised by voracity, but this comes hand in hand with the ‘ventripotent’ IT industry.

Veracity – The eminence of the data being captured for Big Data handling can vary significantly. The quality or lack of quality of the data naturally has the potential to impact the accuracy of analysis using that data.

Before Big Data arrived on the scene we knew nothing about Data Quality or data verification. This is why ETL and Data Cleansing tools lacked the power to effectively quality check and verify data, to ensure that any erroneous or anomalous data was rejected or flagged.

But now, with the sophistication of tools such as ‘grep’ and ‘awk’ at our disposal, we have the power in our hands to ensure nothing ‘dodgy’ gets into the analytical mix.

Vanity – In my opinion, to fully grasp the underlying and profound meaning of Big Data, it is essential for us to understand the difference between vanity and conceit. Max Counsell claimed that “Vanity is the flatterer of the soul”. Goethe characterised vanity as being “a desire for personal glory”. After an incident with an Anarchist (presumably a Big Data Anarchist), Blackadder remarked to Baldrick that “The criminal’s vanity always makes them make one tiny but fatal mistake. Theirs was to have their entire conspiracy printed and published in plain manuscript”.

That’s all folks!

So that ends the brief rundown of the defining characteristics of Big Data.

So, to summarise. That, which has passed before, necessarily divulges both the upside and downside of Big Data. By reaching out, opening up the kimono and relating the 10 Vs we are disclosing that which cannot be disclosed, exhibiting the absence of essential essence, and thereby opening up the entire field, discipline, profession, science and art to examination, questioning and ridicule.

Many thanks for reading.

41 Shots of Great Leadership

20 Fri Mar 2015

Posted by Martyn Jones in Consider this, good start, goodstart, humour

≈ Leave a comment

Tags

Consider this, good start, goodstart, humour, leadership, Martyn Jones


41 Shots of Great Leadership

Consider this. Why be shameful when all around you have apparently no idea of right from wrong?

Continue reading →

7 Signals that someone has quit

14 Sat Mar 2015

Posted by Martyn Jones in Consider this, good start, Good Strat, goodstart, Martyn Richard Jones

≈ Leave a comment

Tags

careers, Consider this, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones, quit


You are the boss. You are the leader, coach and manager, and there are some things that you just got to learn, like it or not. One of these skills is to be able to identify when someone has quit. “How dare they?” I here you ask.

The first time I quit a job and didn’t tell anybody was when I was in the RAF working as a fighter pilot in World War 2, and I accidentally bombed Newport in South Wales, and was given a stern talking to for my troubles. Well, I didn’t actually quit and I was never in the armed forces and I was born into the era of the Beat Generation, but that’s by the by, it’s just there for effect, to create some artificial empathy between me and those who have actually quit a job and not told anyone about it. Myself, I would never do such a thing. Although to be fair, Newport has looked like it has been freshly bombed with dark green, brown and grey shades of poster paints and self-raising flour, since forever. Continue reading →

Consider this: Big Data Forever!

14 Sat Mar 2015

Posted by Martyn Jones in Big Data, Consider this, dark data, Martyn Jones

≈ Leave a comment

Tags

Big Data, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Dans ce pays-ci, il est bon de tuer de temps en temps un amiral pour encourager les autres – Voltair

My gran used to tell me that honesty pays. Of course, she never really understood banking or IT, probably because she didn’t want to know anything about them, and she never lived to witness the amazing hype circuses, the spin doctors spiel or the focus-group dog-and-pony show of the 21st century. Indeed, if honesty were a guaranteed payer my gran would have amassed more wealth than even Warren Buffet himself.

If my gran lived today, she might reflect on what Big Data might be about – maybe she would even consider it benignly, as a sort of shelter for fallen men of once uncertain virtue. We will never know. So onwards and upwards.

The Harvard Business Review contemplated honesty in somewhat different terms:

“Honesty is, in fact, primarily a moral choice. Businesspeople do tell themselves that, in the long run, they will do well by doing good. But there is little factual or logical basis for this conviction. Without values, without a basic preference for right over wrong, trust based on such self-delusion would crumble in the face of temptation.”

In a marvellous book, A few good from Univac, David E. Lundstrom narrates the story of Sperry Univac in the 1960s, one of the true great innovators in the first forty years of IT, and includes an allegory taken from the engineering front-line. I will recount it here, edited to highlight the zeitgeist, for your entertainment and as Voltaire put it, “to encourage the others”:

In the beginning was the Big Data Plan.

And then came the Big Data Assumptions.

And the Assumptions were without form.

And the Plan was without substance.

And darkness was upon the face of the Workers.

And they spoke amongst themselves, saying: “It is a crock of shit, and it stinketh.”

And the workers went unto their Supervisors and said: “It is a pail of dung, and none may abide the odor thereof.”

And the Supervisors went unto their Managers, saying: “It is a container of excrement, and it is very strong, such that none may abide by it.”

And the Managers went unto their Directors, saying: “It is a vessel of fertilizer, and none may abide its strength.”

And the Directors spoke amongst themselves, saying to one another: “It contains that which aids plant growth, and it is very powerful.”

And the Vice Presidents went unto the President, saying unto him: “This new plan will actively promote the growth and vigor of the company, with powerful effects.”

And the President looked upon the Big Data Plan, and saw that it was good.

“But?” I hear you say, “why fight it, why not take advantage of the Big Data zeitgeist?”, “Why not cash in on the grand bonanza Big Data bandwagon?” or “Monetise the 3 three famous Vs of Big Data?”

Well, it had crossed my mind, briefly, and (outside of the USA) we’ve all done stuff we have not entirely believed in, so the temptation to cash in is present, capisci? This paraphrasing of a piece from My Blue Heaven might give you a better idea:

One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex that would make the 256 trillion Shades of Blah blush and a life in the City, the Big Apple or a small town in Germany.

Moreover, for an extra 250 bucks (limited time offer) you can also become a certified Big Data Neuro Trainer, which will allow you to do unto others what has been done unto you.

I also considered Big Data Brokerage, Big Data Certification and Big Data Independent Trading (New York – Paris – Peckham). The opportunities are immense.

However, what happens when the Big Data well runs dry, and I (and many others get tarnished with the mark of Big Data) become pariah by complicity, collusion or simple association?

That question I will leave for another day. But just consider the following.

All right, I admit, I am a big long-time fan of comic genius Mel Brooks, who has a knack of capturing deep insight from the human condition, especially when the human condition is off guard and shallow. In that vein, this is how I like to think the dialogue from the Dole Office scene from The History of the World Part Two would have gone, if he were to write that today:

Dole Office Clerk: Occupation?

Data Magnus Comicus: Stand-up Big Data scientist.

Dole Office Clerk: What?

Data Magnus Comicus: Stand-up Big Data scientist. I coalesce the vaporous datas of the human interaction with the social-media networking, Internet of Everything, and always-connected experience into a… viable, analytical and meaningful predictive-comprehension.

Dole Office Clerk: Oh, a Big Data bullshit artist!

Data Magnus Comicus: *Grumble*…

Dole Office Clerk: Did you bullshit Big Data last week?

Data Magnus Comicus: No.

Dole Office Clerk: Did you try to bullshit Big Data last week?

Data Magnus Comicus: Yes!

Finally, I leave you with some wise words from Israeli American professor of psychology and behavioural economics, Dan Ariely:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Many thanks for reading.

What’s all the fuss about Dark Data? Big Data’s New Best Friend

10 Tue Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this, dark data, Good Strat

≈ Leave a comment

Tags

All Data, Big Data, dark data, data architecture, data management, Good Strat, Martyn Jones, Martyn Richard Jones


What is Dark Data?

Dark data, what is it and why all the fuss?

First, I’ll give you the short answer. The right dark data, just like its brother right Big Data, can be monetised – honest, guv! There’s loadsa money to be made from dark data by ‘them that want to’, and as value propositions go, seriously, what could be more attractive?

Let’s take a look at the market.

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes” (IT Glossary – Gartner)

Techopedia describes dark data as being data that is “found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.” (Techopedia – Cory Jannsen)

Cory also wrote that “IDC, a research firm, stated that up to 90 percent of big data is dark data.”

In an interesting whitepaper from C2C Systems it was noted that “PST files and ZIP files account for nearly 90% of dark data by IDC Estimates.” and that dark data is “Very simply, all those bits and pieces of data floating around in your environment that aren’t fully accounted for:” (Dark Data, Dark Email – C2C Systems)

Elsewhere, Charles Fiori defined dark data as “data whose existence is either unknown to a firm, known but inaccessible, too costly to access or inaccessible because of compliance concerns.” (Shedding Light on Dark Data – Michael Shashoua)

Not quite the last insight, but in a piece published by Datameer, John Nicholson wrote that “Research firm IDC estimates that 90 percent of digital data is dark.” And went on to state that “This dark data may come in the form of machine or sensor logs” (Shine Light on Dark Data – Joe Nicholson via Datameer)

Finally, Lug Bergman of NGDATA wrote this in a sponsored piece in Wired: “It” – dark data – “is different for each organization, but it is essentially data that is not being used to get a 360 degree view of a customer.

Say what?

Okay, let’s see if we can be a bit more specific about the content of dark data?

Items on the dark data ticket include: Email; Instant messages; documents; Sharepoint content; content of collaboration databases; ZIP files; log files; archived sensor and signal data; archived web content; aged audit trails; operational database backups – full and incremental; roll-back, redo and spooled data files; sunsetted applications (code and documentation); partially developed and then abandoned applications; and, code snippets.

Most importantly, dark data is data that is not actively in use, is underutilised, or is something else. Seriously.

What can you do with it?

So, the conclusion that some have come to is this: there is a vast collection of data in various formats waiting to be monetised.

Personally, the idea that really grabs my attention is the potential ability to do novel forensic research on email. If only to find out what happened in the past.

For example, maybe it would be fascinating to see how significant challenges were identified, flagged and discussed; how strategic responses to those challenges were formulated, chosen and executed; and, how the outcomes of all of that process were reflected in email communications.

I think that this line of work can be very interesting for some people, and that interesting insights may be uncovered, but I would hate to have to put a tangible value on it, if only to avoid adding to the already galactic magnitudes of nonsense and hype surrounding certain data topics.

There are other more mundane uses of dark data.

Imagine that you are just about to embark on a Data Warehouse project (you really are a late adopter aren’t you), and you want establish a base collection of historical data. Where do you get that historical data from?

Right! Operational databases are not characteristically used to store significant amounts of historical reference data and historical transactions beyond a certain time window; there are performance and other reasons for keeping OLTP systems as lean as possible, so, initial loads of historical data is typically recreated in the Data Warehouse from backups, audit trails or logs.

Dark data and data governance

You don’t need a Chief Data Officer in order to be able to catalogue all your data assets. However, it is still good idea to have a reliable inventory of all your business data, including the euphemistically termed Big Data and dark data.

If you have such an inventory, you will know:

What you have, where it is, where it came from, what it is used in, what qualitative or quantitative value it may have, and how it relates to other data (including metadata) and the business.

What needs to be kept, and for how long, and what can be safely discarded, and when.

The risks associated with the retention or loss of that data.

If you don’t have such a catalogue and have never done a data inventory then a full data inventory and audit seems to be your new best friend.

What does it mean?

Simply stated, you may have dark data that has value, or it may be a simple collection of worthless digital nostalgia. But if you don’t know what you have, it may pay to find out what’s there, and if necessary, to let it go.

There is no point in hoarding unneeded and unwanted rubbish data. That is simply not good data management.

Finally a word on all the fuss surrounding dark data.

Failure to monetize when there is value to be obtained from dark data is one thing, claiming that value can be invariably obtained whilst actually not knowing what the data is, or how it could be monetised, is just adding to the mountain of data related ‘nonsense and hype’ doing the rounds these days. Please consider not adding to that mountain.

That’s all folks

British Rail, the national UK rail Company, used to be notorious for the number of delays and cancellations to services, and their reasons for failing to meet their obligations became stranger and stranger.

In winter, it would snow and there would be problems. And people would ask ‘how come you couldn’t deal with the snow this year, we’ve had snow for centuries?’ And back came the answers ‘Yes, Sir, but this year it was the wrong type of snow’. In autumn (the fall), it was ‘the wrong types of leaves, and ‘the wrong type of rain’, and in Summer, the ‘wrong type of sunshine’ and so on and so forth.

I hope this will not be the excuse from the Big Data and dark data pundits and punters when the much-vaunted and ‘almost’ guaranteed monetisation isn’t frequently realised.

‘Of course Big Data gives you big dollar benefits, it was just littered with the wrong type of data’ or ‘you just weren’t trying hard enough’.

Many thanks for reading.

← Older posts
Newer posts →

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 639 other subscribers.

Top posts

  • X Is Dying In Europe: Here's Why - Revisited - 2026/02/16
  • Understanding the Data Warehouse Dilemma - 2026/02/07
  • X Is Dying In Europe: Here's Why
  • Brexit is Bullshit
  • Top Countries Known for Arrogance and Ignorance
  • Leadership 7s: Management Talking Points #1
  • Top Influencer Mode - Masterclass Content
  • Hadoop is sinking because big data is bullshit
  • An Open Letter to Mansoor Hussain Laghari
  • Fixing the Data Warehouse - 2026/02/10

Recent Comments

Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Unknown's avatarThe Case for a Globa… on REVEALING WEALTH: USING BIG DA…
Follow GOOD STRATEGY on WordPress.com

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Names in the cloud

All Data Ask Martyn awareness Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy Inform, educate and entertain. IT strategy Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

Recent articles

  • X Is Dying In Europe: Here’s Why – Revisited – 2026/02/16 Feb 16, 2026
  • The Promised Banality of Evil – Revisited Feb 16, 2026
  • Grok, What Do You Make of Martyn Rhisiart Jones’ Take on Big Data? Feb 15, 2026
  • Consider This: In Praise of Shadow-Apps – 2026/02/16 Feb 15, 2026
  • Building the Data Logistics Hub: Pieces and Parts – 2026/02/15 – Part 3 Feb 14, 2026
  • Building the Data Logistics Hub: The Strategy – 2026/02/14 – Part 2 Feb 14, 2026
  • Celtic Mysticism Meets Valentine’s Day Feb 13, 2026

Hours & Info

Spain
+34 692 376 698
martyn.jones@martyn.es
Lunch: 13:30pm - 14:30pm
Dinner: M-Th 20:00pm - 21:00pm, Fri-Sat:21:00pm - 22:00pm

The Stats

  • 118,800 hits

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Log in

Hours & Info

Martyn Richard Jones
Madrid, Spain
+34 692 376 698
martyn.jones@martyn.es
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com
  • X Is Dying In Europe: Here’s Why – Revisited – 2026/02/16
  • The Promised Banality of Evil – Revisited
  • Grok, What Do You Make of Martyn Rhisiart Jones’ Take on Big Data?
  • Consider This: In Praise of Shadow-Apps – 2026/02/16
  • Building the Data Logistics Hub: Pieces and Parts – 2026/02/15 – Part 3

Top Good Strat Posts & Pages

  • X Is Dying In Europe: Here's Why - Revisited - 2026/02/16
  • Understanding the Data Warehouse Dilemma - 2026/02/07
  • X Is Dying In Europe: Here's Why
  • Good Strategy: With Martyn Rhisiart Jones, Sir Afilonius Rex and Lila de Alba.
  • Brexit is Bullshit
  • Top Countries Known for Arrogance and Ignorance
  • Leadership 7s: Management Talking Points #1
  • Top Influencer Mode - Masterclass Content
  • Hadoop is sinking because big data is bullshit
  • An Open Letter to Mansoor Hussain Laghari

Good strat tag cloud

AI All Data Analytics Artificial Intelligence Behavioural Economics BI Big Data bigdata blog books bullshit Business business analysis Business Enablement business intelligence Business Management business strategy chatgpt cloud Consider this data data integration data management data science Data Warehouse Data Warehousing Demagogism digital-marketing Dogma Donald Trump enterprise data warehousing espanol EU fe fiction gaza goodstart good start Good Strat goodstrat Good Strategy hamas history ia information Information and Technology information management Information Technology israel IT Strategy jesus knowledge leadership llm machine learning Marketing Martyn Jones Martyn Richard Jones News Offshoring Organisational Autism palestine Philosophy poesia Poetry Politics Russia Spain statistics Strategy technology trump USA Wales writing

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • Love
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • Masterclass
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • PPE
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • romance
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Valentine
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Subscribe Subscribed
    • GOOD STRATEGY
    • Join 137 other subscribers.
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar