• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn
  • MARTYN
    • MARTYN’S MUSIC
    • Must-Read Books from Martyn
    • PODCASTS
    • MARTYN.ES

GOOD STRATEGY

~ DATA, INFORMATION & KNOWLEDGE

GOOD STRATEGY

Tag Archives: Good Strategy

Big Data’s Virtuous Circus

20 Fri Mar 2015

Posted by Martyn Jones in Big Data, Consider this, data management, good start, goodstart

≈ Leave a comment

Tags

Big Data, data architecture, data management, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones


Many people come up to me in the street and ask me what Big Data is all about. It has happened to me so many times in the past that I am convinced that it might just happen to you as well. I know sort of thing, I read the Big Data tealeaves. Nothing gets past me.

The first time a complete stranger came up to me in public and said “Hello, will you tell me what this Big Data lark is all about then?” I was lost for words, you just ask my Aunt Dolly, he can vouch for that, no problem. Later that day I read a book – it was my dad’s book – and I then decided to adopt a strategy.

Therefore, in the spirit of springtime goodwill to all men and women, I have put together this blog piece in that hope that it will enlighten, help and entertain.

What is big data?

Big Data can be characterised by the 10 Vs – yes, 10, not 4. Which, in my book, is more than enough to bring up-to-speed the average Big Data John or Jane that one meets on the street, and who naturally wish to be informed of such matters.

In layperson’s terms this a series of landmarks and pointers in the analytics space used to frame and guide the didactic aspects of Big Data.

The fundamental Vs of the Big Data canon are these:

  • Vagueness
  • Volume
  • Variety
  • Virility
  • Velocity
  • Vendible
  • Vaticination
  • Voracity
  • Vanity

So, let me now explain what each of these characteristics mean to those who might know and for those who might want to know.

Vagueness: This is perhaps the trickiest of questions to address, given the vast panorama that is cast before this incredibly complex yet easily graspable concept. But let me state this, and let there be no mistake about it. At this point in time, what makes Big Data vague is also what makes Big Data specific, explicit and certain. That is to say, in order to ‘come to an understanding’ of Big Data, it is necessary to completely embrace the dialectic of knowing the unknowable. So belief is an absolute essential element – belief and data, that is.

Volume – If there ever was a time to “pump up the volume”, we have it here with Big Data.

Big, voluminous, gorgeously rotund and infinite. Big Data is called Big Data because there is a lovely, roly-poly, likeable never-ending load of it. Its volumes can be measured in zeta-bytes, which you can be assured, is a helluva lot of data.

Variety – As they might say down my way, “variety is the spice of life, innit”. This is what makes Big Data so special. So appealing.

Because before Big Data there was absolutely no variety in anything, at all. We lived in a bland world, bereft of detail, nuance and diversity. Nothing could be measured, analysed or explained, because we lacked Big Data. We were ignorant. So ignorant and stupid that we couldn’t see the sense of putting the diapers next to the beer, or of offering three for the price of two.

Fortunately, today this is no longer the case if we don’t want it to be, and thanks to Big Data we have a veritable sensorial explosion. No longer is IT just a couple of symbols scribbled in crayon on someone’s school notebook.

Virility – Move over Smart Data, the new kid on the block is Big Data.

If Big Data were described in the manner of a religious text, it would be accompanied by a never ending narrative of begets.

So, what does that mean?

Simply stated, Big Data creates itself, in and of itself. The more Big Data you have, the more Big Data gets created. It’s like a self-fulfilling prophecy in 360 degree, high-definition, poly-faceted and all-encompassing knowing. The sort of thing that governments would pay an arm and a leg to get their mitts on.

Velocity – Velocity is of the essence. Velocity kills the competition. More velocity, less haste.

We demand that service is ‘velocious’. ‘Everything’ must be ‘now’, or it’s too late.

This means we need to be able to handle Big Data at velocity – at the speed of need.

Charles Babbage once stated (or maybe it was more than once) that “whenever the work is itself light, it becomes necessary, in order to economize time, to increase the velocity.”

But remember, we are dealing with mega-velocity here, so don’t drink and drive the Big Data Steamship, Star-ship or Mustang.

Vendible – If you can sell it, and sell it as Big Data, then it ‘is’ Big Data. If you can’t, then it’s not. The saleability of Big Data proves its existence.

So, what are the vendible aspects of Big Data?

Let’s leave that easy question for another day. But for now I can confidently state that it is used to mobilise armies of commentators, industry analysts, publicists, punters, writers, bloggers, gurus, futurologists, conference organisers, conference speakers, educators, customer relationship managers, salespeople, marketers and admen.

Vaticination – Edmund Burke is down on record as stating that “you can never plan the future by the past”. Now Burke may have been a clever person when it came to many things, but he wasn’t exactly a whiz when it came to Big Data.

There are people in the world who are in no doubt that Big Data provides the sort of visionary and predictive powers only previously obtainable through ritual sacrifice, magic potions and the casting of spells. Others are highly critical of the understatement implicit in this belief.

For many, Big Data will make the Oracle of Delphi look like a mere call centre.

This is why the power of vaticination plays a characteristically important role in the world of Big Data.

Voracity – This is based on the quasi-rationalist argument that Big Data is big and it has an omnipresent and insatiable self-fulfilling desire.

Big Data comes with an attendant requirement for hardware, even if it is a whole load of consumer hardware tacked together in a magnificent and miraculous mesh of magic.

Big Data can be characterised by voracity, but this comes hand in hand with the ‘ventripotent’ IT industry.

Veracity – The eminence of the data being captured for Big Data handling can vary significantly. The quality or lack of quality of the data naturally has the potential to impact the accuracy of analysis using that data.

Before Big Data arrived on the scene we knew nothing about Data Quality or data verification. This is why ETL and Data Cleansing tools lacked the power to effectively quality check and verify data, to ensure that any erroneous or anomalous data was rejected or flagged.

But now, with the sophistication of tools such as ‘grep’ and ‘awk’ at our disposal, we have the power in our hands to ensure nothing ‘dodgy’ gets into the analytical mix.

Vanity – In my opinion, to fully grasp the underlying and profound meaning of Big Data, it is essential for us to understand the difference between vanity and conceit. Max Counsell claimed that “Vanity is the flatterer of the soul”. Goethe characterised vanity as being “a desire for personal glory”. After an incident with an Anarchist (presumably a Big Data Anarchist), Blackadder remarked to Baldrick that “The criminal’s vanity always makes them make one tiny but fatal mistake. Theirs was to have their entire conspiracy printed and published in plain manuscript”.

That’s all folks!

So that ends the brief rundown of the defining characteristics of Big Data.

So, to summarise. That, which has passed before, necessarily divulges both the upside and downside of Big Data. By reaching out, opening up the kimono and relating the 10 Vs we are disclosing that which cannot be disclosed, exhibiting the absence of essential essence, and thereby opening up the entire field, discipline, profession, science and art to examination, questioning and ridicule.

Many thanks for reading.

7 Signals that someone has quit

14 Sat Mar 2015

Posted by Martyn Jones in Consider this, good start, Good Strat, goodstart, Martyn Richard Jones

≈ Leave a comment

Tags

careers, Consider this, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones, quit


You are the boss. You are the leader, coach and manager, and there are some things that you just got to learn, like it or not. One of these skills is to be able to identify when someone has quit. “How dare they?” I here you ask.

The first time I quit a job and didn’t tell anybody was when I was in the RAF working as a fighter pilot in World War 2, and I accidentally bombed Newport in South Wales, and was given a stern talking to for my troubles. Well, I didn’t actually quit and I was never in the armed forces and I was born into the era of the Beat Generation, but that’s by the by, it’s just there for effect, to create some artificial empathy between me and those who have actually quit a job and not told anyone about it. Myself, I would never do such a thing. Although to be fair, Newport has looked like it has been freshly bombed with dark green, brown and grey shades of poster paints and self-raising flour, since forever. Continue reading →

Consider this: Big Data Forever!

14 Sat Mar 2015

Posted by Martyn Jones in Big Data, Consider this, dark data, Martyn Jones

≈ Leave a comment

Tags

Big Data, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Dans ce pays-ci, il est bon de tuer de temps en temps un amiral pour encourager les autres – Voltair

My gran used to tell me that honesty pays. Of course, she never really understood banking or IT, probably because she didn’t want to know anything about them, and she never lived to witness the amazing hype circuses, the spin doctors spiel or the focus-group dog-and-pony show of the 21st century. Indeed, if honesty were a guaranteed payer my gran would have amassed more wealth than even Warren Buffet himself.

If my gran lived today, she might reflect on what Big Data might be about – maybe she would even consider it benignly, as a sort of shelter for fallen men of once uncertain virtue. We will never know. So onwards and upwards.

The Harvard Business Review contemplated honesty in somewhat different terms:

“Honesty is, in fact, primarily a moral choice. Businesspeople do tell themselves that, in the long run, they will do well by doing good. But there is little factual or logical basis for this conviction. Without values, without a basic preference for right over wrong, trust based on such self-delusion would crumble in the face of temptation.”

In a marvellous book, A few good from Univac, David E. Lundstrom narrates the story of Sperry Univac in the 1960s, one of the true great innovators in the first forty years of IT, and includes an allegory taken from the engineering front-line. I will recount it here, edited to highlight the zeitgeist, for your entertainment and as Voltaire put it, “to encourage the others”:

In the beginning was the Big Data Plan.

And then came the Big Data Assumptions.

And the Assumptions were without form.

And the Plan was without substance.

And darkness was upon the face of the Workers.

And they spoke amongst themselves, saying: “It is a crock of shit, and it stinketh.”

And the workers went unto their Supervisors and said: “It is a pail of dung, and none may abide the odor thereof.”

And the Supervisors went unto their Managers, saying: “It is a container of excrement, and it is very strong, such that none may abide by it.”

And the Managers went unto their Directors, saying: “It is a vessel of fertilizer, and none may abide its strength.”

And the Directors spoke amongst themselves, saying to one another: “It contains that which aids plant growth, and it is very powerful.”

And the Vice Presidents went unto the President, saying unto him: “This new plan will actively promote the growth and vigor of the company, with powerful effects.”

And the President looked upon the Big Data Plan, and saw that it was good.

“But?” I hear you say, “why fight it, why not take advantage of the Big Data zeitgeist?”, “Why not cash in on the grand bonanza Big Data bandwagon?” or “Monetise the 3 three famous Vs of Big Data?”

Well, it had crossed my mind, briefly, and (outside of the USA) we’ve all done stuff we have not entirely believed in, so the temptation to cash in is present, capisci? This paraphrasing of a piece from My Blue Heaven might give you a better idea:

One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex that would make the 256 trillion Shades of Blah blush and a life in the City, the Big Apple or a small town in Germany.

Moreover, for an extra 250 bucks (limited time offer) you can also become a certified Big Data Neuro Trainer, which will allow you to do unto others what has been done unto you.

I also considered Big Data Brokerage, Big Data Certification and Big Data Independent Trading (New York – Paris – Peckham). The opportunities are immense.

However, what happens when the Big Data well runs dry, and I (and many others get tarnished with the mark of Big Data) become pariah by complicity, collusion or simple association?

That question I will leave for another day. But just consider the following.

All right, I admit, I am a big long-time fan of comic genius Mel Brooks, who has a knack of capturing deep insight from the human condition, especially when the human condition is off guard and shallow. In that vein, this is how I like to think the dialogue from the Dole Office scene from The History of the World Part Two would have gone, if he were to write that today:

Dole Office Clerk: Occupation?

Data Magnus Comicus: Stand-up Big Data scientist.

Dole Office Clerk: What?

Data Magnus Comicus: Stand-up Big Data scientist. I coalesce the vaporous datas of the human interaction with the social-media networking, Internet of Everything, and always-connected experience into a… viable, analytical and meaningful predictive-comprehension.

Dole Office Clerk: Oh, a Big Data bullshit artist!

Data Magnus Comicus: *Grumble*…

Dole Office Clerk: Did you bullshit Big Data last week?

Data Magnus Comicus: No.

Dole Office Clerk: Did you try to bullshit Big Data last week?

Data Magnus Comicus: Yes!

Finally, I leave you with some wise words from Israeli American professor of psychology and behavioural economics, Dan Ariely:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Many thanks for reading.

Consider this: Big Data is not Data Warehousing

06 Fri Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, Good Strat, hadoop, hdfs, Martyn Jones

≈ 4 Comments

Tags

Big Data, enterprise data warehousing, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.

I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.

Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.

In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.

Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.

Data Warehousing is a strategy driven, business oriented and technology based business process.

We stock Data Warehouses with data that, in one way or another, comes from internal and optional external sources, and from structured and optional unstructured data. The process of getting data from a data source to the target Data Warehouse, involves extraction, scrubbing, transformation and loading, ETL for short.

Data Warehousing’s defining characteristics are:

Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.

Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.

Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.

Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible – and certainly in most cases not cost effective – to determine with an operational database.

Non-Volatile: Non-volatility means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The Data Warehouse is, of course, first loaded with cleaned, integrated and transformed data that originated in the operational databases.

We build Data Warehouses iteratively, a piece or two at a time, and each iteration is primarily a result of business requirements, and not technological considerations.

Each iteration of a Data Warehouse is well bound and understood – small enough to be deliverable in a short iteration, and large enough to be significant.

Conversely, Big Data is characterised as being about:

Massive volumes: so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and

High variety: not only structured data, but also the whole range of digital data, and

High velocity: the speed at which data is generated, transmitted and received.

These are known as the three Vs of Big Data, and they are subject to significant and debilitating contradictions, even amongst the gurus of Big Data (as I have commented elsewhere: Contradictions of Big Data).

From time to time, Big Data pundits slam Data Warehousing for not being able to cope with the Big Data type hacking that they are apparently used to carrying out, but this is a mistake of those who fail to recognise a false Data Warehouse when they see one.

So let’s call these false flag Data Warehouse projects something else, such as Data Doghouses.

“Data Doghouse, meet Pig Data.”

Failed or failing Data Doghouses fail for the same reasons that Big Data projects will frequently fail. Both will almost invariably fail to deliver artefacts on time and to expectations; there will be failures to deliver value or even simply to return a break even in costs versus benefits; and of course, there will be failures to deliver any recognisable insight.

Failure happens in Data Doghousing (and quite possibly in Big Data as well) because there is a lack of coherent and cohesive arguments for embarking on such endeavours in the first place; a lack of real business drivers; and, a lack of sense and sensibility.

There is also a willing tendency to ignore the advice of people who warn against joining in the Big Data hubris. Why do some many ignore the ulterior motives of interested parties who are solely engaged in riding on the faddish Big Data bandwagon to maximise the revenue they can milk off punters? Why do we entertain pundits and charlatans who ‘big up’ Big Data whilst simultaneously cultivating an ignorance of data architecture, data management and business realities?

Some people say that the main difference between Big Data and Data Warehousing is that Big Data is technology, and Data Warehousing is architecture.

Now, whilst I totally respect the views of the father of Data Warehousing himself, I also think that he was being far too kind to the Big Data technology camp. However, of course, that is Bill’s choice.

Let me put it this way, if Oracle gave me the code for Oracle 3, I could add 256 bit support, parallel processing and give it an interface makeover, and it would be 1000 times better than any Big Data technology currently in the market (and that version of Oracle is from about 1983).

Therefore, Data Warehousing has no serious competing paragon. Data Warehousing is a real architecture, it has real process methodologies, it is tried and proven, it has success stories that are no secrets, and these stories include details of data, applications and the names of the companies and people involved, and we can point at tangible benefits realised. It’s clear, it’s simple and it’s transparent.

Just like Big Data, right?

Well, no.

See what I mean?

Therefore, the next time someone says to you that Big Data will replace Data Warehousing or that Data Warehousing is Big Data, or any variations on that sort of ‘stupidity’ theme, you can now tell them to take a hike, in the confidence that you are on the side of reason.

Many thanks for reading.

More perspectives on Big Data

Aligning Big Data: http://www.linkedin.com/pulse/aligning-big-data-martyn-jones

Big Data and the Analytics Data Store: http://www.linkedin.com/pulse/big-data-analytics-store-martyn-jones

A Modern Manager’s Guide to Big Data:http://www.linkedin.com/pulse/managers-guide-big-data-context-martyn-jones

Core Statistics coexisting with Data Warehousing

Accomodating Big Data

And a big thank you to Bill Inmon (the father of Data Warehousing and of DW 2.0)

Contradictions of Big Data

01 Sun Mar 2015

Posted by Martyn Jones in Ask Martyn, Big Data, Consider this

≈ 1 Comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


What we’ve been told

We’ve been told that Big Data is the greatest thing since sliced bread, and that its major characteristics are massive volumes (so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it), high variety (not only structured data, but also the whole range of digital data), and high velocity (the speed at which data is generated and transmitted). Also, from time to time, much to the chagrin of some Big Data disciples, a whole slew of new identifying Vs are produced, touted and then dismissed (check out my LinkedIn Pulse article on Big Data and the Vs).

So, beware. Things in Big Data may not be as they may seem.

It’s not about big

I have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Can we call that ‘strike one’ for Big Data Vs?

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational. It is also claimed that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is simply wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So it does not add variety, even though it is not structured in the ways that some people might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.

Strike two! Third time lucky?

It’s not even about velocity

So, if we accept that Big Data isn’t really about the data volumes or data variety that leaves us with velocity, right? Well no, because if it isn’t about record breaking VLDBor significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue. The fact that some software vendors and IT service suppliers set up this ‘straw man’ argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

Strike three, and counting.

It’s not about the manageability of Big Data

We have been told and time again that the major difference between a data scientist and professional statistician is that the ‘scientists’ know how to cope very well with massive volumes, varieties and velocities of data. Now it turns out that this is also questionable.

According to Bob Violino writing in Information Management (Messy Big Data Overwhelms Data Scientists – 20 February 2015) “Data scientists see messy, disorganized data as a major hurdle preventing them from doing what they find most interesting in their jobs”. So, when it comes to data quality and structure the ‘scientists’ don’t really have an advantage over professional statisticians.

Last year Thomas C. Redman writing in the Harvard Business Review (Data’s Credibility Problem) noted that when Big Data is unreliable “managers quickly lose faith” and “and fall back on their intuition to make decisions, steer their companies, and implement strategy” and when this happens there is a propensity to reject potentially “important, counterintuitive implications that emerge from big data analyses.”

Strike four?

The new analytics aren’t new

Data science and Big Data analytics are the new kids on the block, aren’t they?

Well, here are some real life scenarios.

A major banking equipment supplier: A lot of banking equipment is hybrid analogic-digital, a simple example of this would be a photo copier or a physical document processing device. One major supplier decided to incorporate the capture of sensor data produced by their devices to predict failure and problems. Predictive preventive maintenance rules are created and corroborated using the data generated by sensors on each customer device, and these rules then get incorporated into the devices logic.

A major IT vendor: What happens when you create an intersection and convergence between technologies, techniques and method from areas of mainstream IT, data architecture and management, statistics (quantitative and qualitative analytics) and data visualisation, artificial intelligence/machine learning and knowledge management? This is precisely what one of the main European IT vendors did, and the idea proved to be quite attractive to customers, prospects and investors.

A major integrated circuit supplier: The testing of ICs at the ‘fabs’ (manufacturing plants) generates serious amount of data. This data is used to detect errors in the IC manufacturing process, it is captured and analysed in as near real-time as possible, which is necessary due to the costly nature of over-running the production of faulty ICs. To get around this problem the company uses a combination of fast data capture, transformation and loading of data into a data analytics area to ensure early and precise problem detection.

All Big Data Analytics success stories?

The first happened in 1989, the second in 1993 and the third in 2001. Yes, Big Data and Big Data analytics are sort of newish.

Strike five.

The science is frequently not very scientific

What is science?

According to Vasant Dhar of the Stern School of Business (Data Science and Prediction), Jeff Leek (The key word in “Data Science” is not Data, it is Science), and repeated on Wikipedia, “In general terms, data science is the extraction of knowledge from data”. Well, excuse me if I beg to differ. I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research.

If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.

As Hazelcast VP Miko Matsumura suggested in Data Science is Dead “… put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data” and ” Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.”

Strike six.

And the value is questionable

DATA: “Data is a super-class of a modern representation of an arcane symbology.” – Anon

If I had a dollar for every time I heard someone claim that data has intrinsic positive value then I would be as wealthy as Warren Buffet.

If I have said it once, I have said it a hundred time. In order for data to be more than an operational necessity it requires context.

Providing valid data with valid context turns that data into information.

Data can be relevant and data can be irrelevant. That relevance or irrelevance of data may be permanent or temporary, continuous or episodic, qualitative or quantitative.

Some data is meaningless, and there are cases whereby nobody can remember why it was collected or what purpose it serves.

Taking all this into account we can ask the deadly pragmatic question: what value does this data have? Which is sometimes answered with a pertinent ‘no value whatsoever’.

Strike seven.

So what is it really about?

It is said that Big Data is changing the world, but for all intents and purposes, and shamed by previous Big Data excesses, some people are rapidly changing the definitions and parameters of Big Data, and to position it as being more tangible and down-to-earth, whilst moving it away from its position as an overhyped and dead-ended liability.

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. So, let’s drop the Big Data moniker, and embrace the fact that data is data, and long live ‘All Data’, yes, all digital data. Let’s consider all data and for what it’s worth to the business, and not for what some chatterers reckon its value is – having as they do, little or no insight into the businesses to which they refer, or of the data in that these businesses possess.

So, when push comes to shove, is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?

Thanks very much for reading.

Big Data in Question – Again

01 Sun Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this

≈ Leave a comment

Tags

All Data, Big Data, data management, Good Strat, good strat blog, Good Strategy, Martyn Jones, Martyn Richard Jones


Big Data is now an inhospitable and unhealthy land inhabited by those who, through accident or design, deceive naïve and sentimental bystanders and those who are willingly mislead.

When all of this Big Data malarkey started it was sort of funny, humorous and occasional witty, especially in the affected, bizarre and the frequently uninhibited ways that freshly-minted self-appointed gurus and experts would “big it up”

Doctor Freud would have had a field day with all of that, being as it was, and for that matter still is, a postmodern mishmash of Riefenstahl, Freddy Mercury and Monty Python on steroids. However, after that extended, operatic and high-camp hiatus it all went downhill.

The Big Data scene is fast becoming an outrageous and brash festival of deception, disinformation and obliviousness. Which is a pity, because it does the industry no good whatsoever.

It is telling that Big Data evangelists, gurus and assorted sycophants cannot even define Big Data adequately, never mind discuss (or for that matter, point at) tangible success stories, without falling into contradictions on all of the key defining characteristics of volume, variety and velocity, and resorting to crude debating devices to avoid or finesse the concerns and the questions.

Almost every morning I check out the industry news, and almost invariably, it comes with new mind-boggling examples of Big Data nonsense.

However, it isn’t always nonsense for nonsense’s sake, there are agendas, there are rational explanations why Big Data has become at the same time, one of the most hyped up fads in the history of IT, and one that its supporters find so difficult to actually explain and justify, in any reasonable sort of way.

Therefore, when it comes to Big Data, beyond the surfeit of platitudes, clichés, bluff and bluster, the only thing in play are the interests of industry, the patrons, the courtesans and their entourage of the innocent and the beguiled.

One of the biggest deceptions in Big Data is in the misleadingly named ‘success stories’. The thing is that most of these success stories that I have ever read have been:

  • So vague that it’s difficult to know how success is being defined never mind reached.
  • So secretive and obtuse is the avoidance of naming names, locations and other relevant Big Data references that it’s impossible to corroborate if these claims are actually true or not. Disclaimer: I have worked for some of the biggest IT vendors, and in senior roles, and I know what is behind comments such as “the Big Data project is a success, although the client name and project are confidential” and “it’s delivering such major competitive advantages that we are obliged to keep it under wraps”.
  • Stories stolen from elsewhere, such as from Data Warehousing, Business Intelligence, VLDB or Business Application projects.
  • Borderline fantasies and badly contrived technology fan fiction.

However, it doesn’t stop there.

One of the clearest examples of the questionable nature of Big Data evangelism is when it is used to piggyback Big Data hype on simple, tangible and immediately recognisable artefacts or applications that have little in common with Big Data.

This is an extreme illustration, but it works like this: “iPhones are commercially successful, iPhones are part of Big Data, and therefore Big Data is commercially successful.”

As if the mere conjuring up of association, affinity and proximity will convince people of the great and growing value of Big Data.

What I am also referring to are publicity pieces that may as well have been titled:

  • Smith, Galbraith, Mies, Keynes, Homer SImpson and the economic justification of Big Data
  • Lovelace, Babbage, von Neumann, Eckert, Davies, Codd, Knuth, Naur and the technological underpinnings of Big Data
  • Einstein, Freud, Edison, Faraday, Recorde and the intellectual structure of Big Data
  • Socrates, Kant, Hegel, Marx , Adorno and the philosophical correctness of Big Data
  • Great quotes about Big Data, from the Cambrian era to the postmodern époque
  • Great jokes about Big Data, from Mel Brooks to Steve Martin
  • Sportspeople and Big Data, from Lottie Dodd and Babe Ruth to Rafa Nadal and CR7
  • Industry support of Big Data, from Henry Ford to Neutron Jack

Do you recognise similarities?

It’s no big deal, just the use of unreliable, misleading and inappropriate fallacies, dressed up as cute, plausible and accessible collateral. People may think that such things are clever and witty, but they aren’t, it’s just misleading.

Let’s continue with something simple.

Evasion is, in ethics, an act that deceives by stating a true statement that is immaterial or leads to a false deduction. For example, citing events, persons or anecdotes from the history of IT to justify the supposed or imaginary value of Big Data. This is close to the notion of a non sequitur, which of course is an argument, the conclusions from which do not follow from its premise. It falls short of being full-on sophistry, purely because the simplistic, puerile and superficial arguments put forward in favour of Big Data do not match those of the true sophist who seeks to reason with clever but fallacious and deceptive arguments. Too many of the Big Data arguments are fallacious and deceptive, but no one, equipped with a reasonable capacity for critical thinking, should take such ‘arguments’ as valid.

Hold this thought: Big Data hype is a viper’s nest of logical fallacies, white lies and disinformation.

Just when I think things could not get any weirder, they do, and Big Data ceiling of hyperbole rises even higher, up to the rarer atmosphere of extreme tendentiousness.

There is a growing mass of Big Data hoop-la, hyperbole and flim flam that exceeds all previously bounds of overstatement, solecism and confabulation. This is where the real volumes, varieties and velocities are in Big Data; in hokie.

We live, as Oscar Wilde said in his day, in and age of surfaces. Yes, superficiality, puerility and short-termism are the competing orders of the day. However, I am still amazed – and maybe wrongly so – by what ostensibly professional, experienced and knowledgeable people are willing, able and prepared to accept, especially when it comes to Big Data flim flam sauce.

Here are some examples of the nonsense about Big Data that is taken as gospel by ‘adults’:

Data Warehousing is part of Big Data: No comment.

Big Data will replace Enterprise Data Warehousing: People can’t even explain the features and benefits of Big Data. I try it make it as easy as possible, ‘if you can’t say it, point to it’. But, seriously, people can’t even relate tangible and credible Big Data success stories, never mind show how it will replace Enterprise Data Warehousing, whether that’s the Inmon or Kimball flavour, take your pick.

Everyone and every organisation can benefit from Big Data: If people can’t explain this, and they don’t in terms of tangible benefits, then the claim should remain questionable.

Data Scientists will replace Statisticians: Why is that so? It is claimed that Data Scientists are uniquely equipped to handle massive volumes, varieties and velocities of data – well, as it turns out, this isn’t certain either.

Big Data is in its infancy: I think we may be confusing infancy with lack of real traction, and of time and place utility.

You cannot be serious: Just what are people talking about here? I have read vague, naïve and ill-informed pieces about data management, data architecture, data warehousing, reporting, business intelligence and a plethora of etcetera that have been passed off as observations and commentary on Big Data. So, what makes people recycle hackneyed, misleading and badly conceptualised ‘content’?

In the commentary on one of Bernard Marr’s pieces on LinkedIn (a professional networking site) I observed that no one can adequately explain what Big Data is without falling into contradictions and fancies, and no one seems to be capable or willing to provide tangible success stories.

Bernard responded to this comment by pointing out “the reason for that is that Big Data means different things to different people.”

Fair enough. It’s an explanation.

That said, I have always had more than a tenuous dislike of postmodern thinking, in fact most things ‘postmodern’. Call me old fashioned, jaded or cynical, but to me, the idea that everything can mean anything is an aberration that I prefer to leave to others.

I am at a loss to explain why so many reasonable people are willing to embrace the hype surrounding Big Data and Big Data Analytics, including the attendant surfeit of nonsense, incongruences and contradictions, and from my perspective, it defies reason and good sense.

Therefore, I will just end again with a fabulous quote from Ben Goldacre:

“You cannot reason people out of a position that they did not reason themselves into”.

Many thanks for reading.

Contradictions of Big Data – Short

01 Sun Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

≈ Leave a comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Please note: This is an edited version of a previous piece with a similar name, but focusing solely on the three main Vs of Big Data.

What we’ve been told

We’ve been told that business Big Data is the greatest thing since sliced bread, and that its major characteristics are:

  • massive volumes – so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and
  • high variety – not only structured data, but also the whole range of digital data, and
  • high velocity – the speed at which data is generated, transmitted and received

Which is a simple and straightforward means of classification. Big Data is about massive volumes, high variety and high velocity. Right?

It’s not about big

I have never bought into the idea that more data is necessarily better data, or that it provides better focus or leads to increased insight, in fact I have been quite vocal with my contrarian opinion, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Over at Fierce Big Data it was Pam Baker who stated that “the term big data is unfortunate because it’s really not about the size of the data”. (Big data is not about petabytes, but complex computing).

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Well, apparently Big Data isn’t about “massive volumes” of data.

Strike 1!

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational.

It is also said that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So there it does not add variety, even though it is not structured in the ways that some someone might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, juvenile twittering, home videos of cute kittens, or the complete works of William Shakespeare. Almost all business analysis (whether done by a professional statistician or a data scientist) will continue to be carried out using structured data obtained primarily from internal operational systems and external structured data providers.

Variety, Sir? No problem.

Strike two!

It’s not even about velocity

So, if we accept that Big Data isn’t really about the massive data volumes or high data variety then that leaves us with velocity. Because if it isn’t about record breaking VLDB or significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue.

Even in some extreme circumstances, one can explore the suggestion that data sampling can remove issues with data volume as well as velocity.

However, the fact that some software vendors and IT service suppliers set up this‘straw man’ velocity argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

So, is it really about velocity?

Strike three!

So what is it really about?

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. Let’s dump the Big Data moniker, and the 3 Vs along with it, and embrace the fact that data is data, there will always be more of it.

So, let’s consider ‘all data’ and principally for its time and place utility.

If there is something that you are not sure about or have questions with then please leave a comment below or email me.

Thanks very much for reading.

Consider this: Big Data and the Pot of Tea

17 Tue Feb 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones, Strategy

≈ Leave a comment

Tags

Analytics, Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


To begin at the beginning

Hold this thought: Big Data is King.

Is there just nothing that Big Data isn’t capable of fixing? From terrorism, world hunger, Ebola, HIV, fraud, money laundering and hiring the ‘right’ people through to winning the lottery, curing hangovers, arranging entrapment and finding the love of your life. Big Data is King. Continue reading →

The amazing world of Fred’s Big Data

15 Sun Feb 2015

Posted by Martyn Jones in Big Data, Consider this

≈ Leave a comment

Tags

Big Data, data management, Good Strat, Good Strategy, information management, knowledge management, Martyn Jones, Martyn Richard Jones


Hold this thought: There are real golden nuggets of data that many organisations are oblivious to. But first let’s look at business process management. Continue reading →

Big Data Will Save the World

12 Thu Feb 2015

Posted by Martyn Jones in Big Data, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

≈ Leave a comment

Tags

Big Data, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Good morning fellow consumers; here’s a pop quiz question: What does Big Data have in common with Robitussin? Think about, take your time.

Okay, times up!

Robitussin is a legal pharmaceutical product commonly associated with coughs, colds and flu combinations. Continue reading →

← Older posts
Newer posts →

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 640 other subscribers.

Top posts

  • Laughing @ Data.Com: A Satirical Take on IT Industry Hype
  • X Is Dying In Europe: Here's Why - Revisited - 2026/02/16
  • IT'S POLITICS: THE AFTERLIFE BRIEFING
  • Brexit is Bullshit
  • Top Countries Known for Arrogance and Ignorance
  • X Is Dying In Europe: Here's Why
  • Understanding Hasbara: Israel's Narrative Control Tactics
  • Understanding Religious Arbitration in the USA
  • Understanding the Data Warehouse Dilemma - 2026/02/07
  • Agile at Scale is Bullshit by Design

Recent Comments

Laster's avatarLaster on Aligning ICT with Business Str…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Follow GOOD STRATEGY on WordPress.com

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Names in the cloud

All Data Ask Martyn awareness Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy Inform, educate and entertain. IT strategy Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

Recent articles

  • IT’S POLITICS: THE AFTERLIFE BRIEFING Mar 11, 2026
  • Laughing @ Data.Com: A Satirical Take on IT Industry Hype Mar 10, 2026
  • Laughing@Data.Com: A Candid Review of Data’s Absurdities Mar 10, 2026
  • Laughing@Data.Com: Hilarious Heresy Against the Hype Machine Mar 9, 2026
  • Top Things You Can Learn From GOODSTRAT.COM – Blazing Stories Mar 9, 2026
  • Top Things You Can Learn From GOODSTRAT.COM Mar 8, 2026
  • Revealing Wealth: A Blueprint for Financial Transparency – Book Review Mar 8, 2026

Hours & Info

Spain
+34 692 376 698
martyn.jones@martyn.es
Lunch: 13:30pm - 14:30pm
Dinner: M-Th 20:00pm - 21:00pm, Fri-Sat:21:00pm - 22:00pm

The Stats

  • 120,311 hits

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Log in

Hours & Info

Martyn Richard Jones
Madrid, Spain
+34 692 376 698
martyn.jones@martyn.es
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com
  • IT’S POLITICS: THE AFTERLIFE BRIEFING
  • Laughing @ Data.Com: A Satirical Take on IT Industry Hype
  • Laughing@Data.Com: A Candid Review of Data’s Absurdities
  • Laughing@Data.Com: Hilarious Heresy Against the Hype Machine
  • Top Things You Can Learn From GOODSTRAT.COM – Blazing Stories

Top Good Strat Posts & Pages

  • Laughing @ Data.Com: A Satirical Take on IT Industry Hype
  • Good Strategy: With Martyn Rhisiart Jones, Sir Afilonius Rex and Lila de Alba.
  • X Is Dying In Europe: Here's Why - Revisited - 2026/02/16
  • IT'S POLITICS: THE AFTERLIFE BRIEFING
  • Brexit is Bullshit
  • Top Countries Known for Arrogance and Ignorance
  • X Is Dying In Europe: Here's Why
  • Understanding Hasbara: Israel's Narrative Control Tactics
  • Understanding Religious Arbitration in the USA
  • Understanding the Data Warehouse Dilemma - 2026/02/07

Good strat tag cloud

AI amoral Analytics art Artificial Intelligence bad-leader Behavioural Economics Big Data blog books Business Business Enablement business intelligence Business Management business strategy chatgpt cloud coaching Consider this Creativity data data architecture data management data mesh Data Warehouse decency degenerate depraved digital-marketing Dogma enterprise data warehousing espanol fiction gaza good-leader goodstart good start Good Strat goodstrat Good Strategy history honesty honour hubris information information management Information Technology inspiration israel IT Strategy jesus knowledge leader leadership literature llm Management Marketing Martyn Jones Martyn Richard Jones mental-health News Organisational Autism palestine Philosophy Poetry Politics relationships Strategy technology travel trump vulgarity Wales writing

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • Love
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • Masterclass
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • PPE
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • romance
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Valentine
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Subscribe Subscribed
    • GOOD STRATEGY
    • Join 138 other subscribers.
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar