• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn
  • Must-Read Books from Martyn
  • MARTYN’S MUSIC
  • PODCASTS

GOOD STRATEGY

~ for every significant challenge

GOOD STRATEGY

Tag Archives: Good Strat

Big Data’s Virtuous Circus

20 Friday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, data management, good start, goodstart

≈ Leave a comment

Tags

Big Data, data architecture, data management, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones


Many people come up to me in the street and ask me what Big Data is all about. It has happened to me so many times in the past that I am convinced that it might just happen to you as well. I know sort of thing, I read the Big Data tealeaves. Nothing gets past me.

The first time a complete stranger came up to me in public and said “Hello, will you tell me what this Big Data lark is all about then?” I was lost for words, you just ask my Aunt Dolly, he can vouch for that, no problem. Later that day I read a book – it was my dad’s book – and I then decided to adopt a strategy.

Therefore, in the spirit of springtime goodwill to all men and women, I have put together this blog piece in that hope that it will enlighten, help and entertain.

What is big data?

Big Data can be characterised by the 10 Vs – yes, 10, not 4. Which, in my book, is more than enough to bring up-to-speed the average Big Data John or Jane that one meets on the street, and who naturally wish to be informed of such matters.

In layperson’s terms this a series of landmarks and pointers in the analytics space used to frame and guide the didactic aspects of Big Data.

The fundamental Vs of the Big Data canon are these:

  • Vagueness
  • Volume
  • Variety
  • Virility
  • Velocity
  • Vendible
  • Vaticination
  • Voracity
  • Vanity

So, let me now explain what each of these characteristics mean to those who might know and for those who might want to know.

Vagueness: This is perhaps the trickiest of questions to address, given the vast panorama that is cast before this incredibly complex yet easily graspable concept. But let me state this, and let there be no mistake about it. At this point in time, what makes Big Data vague is also what makes Big Data specific, explicit and certain. That is to say, in order to ‘come to an understanding’ of Big Data, it is necessary to completely embrace the dialectic of knowing the unknowable. So belief is an absolute essential element – belief and data, that is.

Volume – If there ever was a time to “pump up the volume”, we have it here with Big Data.

Big, voluminous, gorgeously rotund and infinite. Big Data is called Big Data because there is a lovely, roly-poly, likeable never-ending load of it. Its volumes can be measured in zeta-bytes, which you can be assured, is a helluva lot of data.

Variety – As they might say down my way, “variety is the spice of life, innit”. This is what makes Big Data so special. So appealing.

Because before Big Data there was absolutely no variety in anything, at all. We lived in a bland world, bereft of detail, nuance and diversity. Nothing could be measured, analysed or explained, because we lacked Big Data. We were ignorant. So ignorant and stupid that we couldn’t see the sense of putting the diapers next to the beer, or of offering three for the price of two.

Fortunately, today this is no longer the case if we don’t want it to be, and thanks to Big Data we have a veritable sensorial explosion. No longer is IT just a couple of symbols scribbled in crayon on someone’s school notebook.

Virility – Move over Smart Data, the new kid on the block is Big Data.

If Big Data were described in the manner of a religious text, it would be accompanied by a never ending narrative of begets.

So, what does that mean?

Simply stated, Big Data creates itself, in and of itself. The more Big Data you have, the more Big Data gets created. It’s like a self-fulfilling prophecy in 360 degree, high-definition, poly-faceted and all-encompassing knowing. The sort of thing that governments would pay an arm and a leg to get their mitts on.

Velocity – Velocity is of the essence. Velocity kills the competition. More velocity, less haste.

We demand that service is ‘velocious’. ‘Everything’ must be ‘now’, or it’s too late.

This means we need to be able to handle Big Data at velocity – at the speed of need.

Charles Babbage once stated (or maybe it was more than once) that “whenever the work is itself light, it becomes necessary, in order to economize time, to increase the velocity.”

But remember, we are dealing with mega-velocity here, so don’t drink and drive the Big Data Steamship, Star-ship or Mustang.

Vendible – If you can sell it, and sell it as Big Data, then it ‘is’ Big Data. If you can’t, then it’s not. The saleability of Big Data proves its existence.

So, what are the vendible aspects of Big Data?

Let’s leave that easy question for another day. But for now I can confidently state that it is used to mobilise armies of commentators, industry analysts, publicists, punters, writers, bloggers, gurus, futurologists, conference organisers, conference speakers, educators, customer relationship managers, salespeople, marketers and admen.

Vaticination – Edmund Burke is down on record as stating that “you can never plan the future by the past”. Now Burke may have been a clever person when it came to many things, but he wasn’t exactly a whiz when it came to Big Data.

There are people in the world who are in no doubt that Big Data provides the sort of visionary and predictive powers only previously obtainable through ritual sacrifice, magic potions and the casting of spells. Others are highly critical of the understatement implicit in this belief.

For many, Big Data will make the Oracle of Delphi look like a mere call centre.

This is why the power of vaticination plays a characteristically important role in the world of Big Data.

Voracity – This is based on the quasi-rationalist argument that Big Data is big and it has an omnipresent and insatiable self-fulfilling desire.

Big Data comes with an attendant requirement for hardware, even if it is a whole load of consumer hardware tacked together in a magnificent and miraculous mesh of magic.

Big Data can be characterised by voracity, but this comes hand in hand with the ‘ventripotent’ IT industry.

Veracity – The eminence of the data being captured for Big Data handling can vary significantly. The quality or lack of quality of the data naturally has the potential to impact the accuracy of analysis using that data.

Before Big Data arrived on the scene we knew nothing about Data Quality or data verification. This is why ETL and Data Cleansing tools lacked the power to effectively quality check and verify data, to ensure that any erroneous or anomalous data was rejected or flagged.

But now, with the sophistication of tools such as ‘grep’ and ‘awk’ at our disposal, we have the power in our hands to ensure nothing ‘dodgy’ gets into the analytical mix.

Vanity – In my opinion, to fully grasp the underlying and profound meaning of Big Data, it is essential for us to understand the difference between vanity and conceit. Max Counsell claimed that “Vanity is the flatterer of the soul”. Goethe characterised vanity as being “a desire for personal glory”. After an incident with an Anarchist (presumably a Big Data Anarchist), Blackadder remarked to Baldrick that “The criminal’s vanity always makes them make one tiny but fatal mistake. Theirs was to have their entire conspiracy printed and published in plain manuscript”.

That’s all folks!

So that ends the brief rundown of the defining characteristics of Big Data.

So, to summarise. That, which has passed before, necessarily divulges both the upside and downside of Big Data. By reaching out, opening up the kimono and relating the 10 Vs we are disclosing that which cannot be disclosed, exhibiting the absence of essential essence, and thereby opening up the entire field, discipline, profession, science and art to examination, questioning and ridicule.

Many thanks for reading.

7 Signals that someone has quit

14 Saturday Mar 2015

Posted by Martyn Jones in Consider this, good start, Good Strat, goodstart, Martyn Richard Jones

≈ Leave a comment

Tags

careers, Consider this, good start, Good Strat, Good Strategy, goodstart, Martyn Jones, Martyn Richard Jones, quit


You are the boss. You are the leader, coach and manager, and there are some things that you just got to learn, like it or not. One of these skills is to be able to identify when someone has quit. “How dare they?” I here you ask.

The first time I quit a job and didn’t tell anybody was when I was in the RAF working as a fighter pilot in World War 2, and I accidentally bombed Newport in South Wales, and was given a stern talking to for my troubles. Well, I didn’t actually quit and I was never in the armed forces and I was born into the era of the Beat Generation, but that’s by the by, it’s just there for effect, to create some artificial empathy between me and those who have actually quit a job and not told anyone about it. Myself, I would never do such a thing. Although to be fair, Newport has looked like it has been freshly bombed with dark green, brown and grey shades of poster paints and self-raising flour, since forever. Continue reading →

Consider this: Big Data Forever!

14 Saturday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, dark data, Martyn Jones

≈ Leave a comment

Tags

Big Data, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Dans ce pays-ci, il est bon de tuer de temps en temps un amiral pour encourager les autres – Voltair

My gran used to tell me that honesty pays. Of course, she never really understood banking or IT, probably because she didn’t want to know anything about them, and she never lived to witness the amazing hype circuses, the spin doctors spiel or the focus-group dog-and-pony show of the 21st century. Indeed, if honesty were a guaranteed payer my gran would have amassed more wealth than even Warren Buffet himself.

If my gran lived today, she might reflect on what Big Data might be about – maybe she would even consider it benignly, as a sort of shelter for fallen men of once uncertain virtue. We will never know. So onwards and upwards.

The Harvard Business Review contemplated honesty in somewhat different terms:

“Honesty is, in fact, primarily a moral choice. Businesspeople do tell themselves that, in the long run, they will do well by doing good. But there is little factual or logical basis for this conviction. Without values, without a basic preference for right over wrong, trust based on such self-delusion would crumble in the face of temptation.”

In a marvellous book, A few good from Univac, David E. Lundstrom narrates the story of Sperry Univac in the 1960s, one of the true great innovators in the first forty years of IT, and includes an allegory taken from the engineering front-line. I will recount it here, edited to highlight the zeitgeist, for your entertainment and as Voltaire put it, “to encourage the others”:

In the beginning was the Big Data Plan.

And then came the Big Data Assumptions.

And the Assumptions were without form.

And the Plan was without substance.

And darkness was upon the face of the Workers.

And they spoke amongst themselves, saying: “It is a crock of shit, and it stinketh.”

And the workers went unto their Supervisors and said: “It is a pail of dung, and none may abide the odor thereof.”

And the Supervisors went unto their Managers, saying: “It is a container of excrement, and it is very strong, such that none may abide by it.”

And the Managers went unto their Directors, saying: “It is a vessel of fertilizer, and none may abide its strength.”

And the Directors spoke amongst themselves, saying to one another: “It contains that which aids plant growth, and it is very powerful.”

And the Vice Presidents went unto the President, saying unto him: “This new plan will actively promote the growth and vigor of the company, with powerful effects.”

And the President looked upon the Big Data Plan, and saw that it was good.

“But?” I hear you say, “why fight it, why not take advantage of the Big Data zeitgeist?”, “Why not cash in on the grand bonanza Big Data bandwagon?” or “Monetise the 3 three famous Vs of Big Data?”

Well, it had crossed my mind, briefly, and (outside of the USA) we’ve all done stuff we have not entirely believed in, so the temptation to cash in is present, capisci? This paraphrasing of a piece from My Blue Heaven might give you a better idea:

One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex that would make the 256 trillion Shades of Blah blush and a life in the City, the Big Apple or a small town in Germany.

Moreover, for an extra 250 bucks (limited time offer) you can also become a certified Big Data Neuro Trainer, which will allow you to do unto others what has been done unto you.

I also considered Big Data Brokerage, Big Data Certification and Big Data Independent Trading (New York – Paris – Peckham). The opportunities are immense.

However, what happens when the Big Data well runs dry, and I (and many others get tarnished with the mark of Big Data) become pariah by complicity, collusion or simple association?

That question I will leave for another day. But just consider the following.

All right, I admit, I am a big long-time fan of comic genius Mel Brooks, who has a knack of capturing deep insight from the human condition, especially when the human condition is off guard and shallow. In that vein, this is how I like to think the dialogue from the Dole Office scene from The History of the World Part Two would have gone, if he were to write that today:

Dole Office Clerk: Occupation?

Data Magnus Comicus: Stand-up Big Data scientist.

Dole Office Clerk: What?

Data Magnus Comicus: Stand-up Big Data scientist. I coalesce the vaporous datas of the human interaction with the social-media networking, Internet of Everything, and always-connected experience into a… viable, analytical and meaningful predictive-comprehension.

Dole Office Clerk: Oh, a Big Data bullshit artist!

Data Magnus Comicus: *Grumble*…

Dole Office Clerk: Did you bullshit Big Data last week?

Data Magnus Comicus: No.

Dole Office Clerk: Did you try to bullshit Big Data last week?

Data Magnus Comicus: Yes!

Finally, I leave you with some wise words from Israeli American professor of psychology and behavioural economics, Dan Ariely:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Many thanks for reading.

What’s all the fuss about Dark Data? Big Data’s New Best Friend

10 Tuesday Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this, dark data, Good Strat

≈ Leave a comment

Tags

All Data, Big Data, dark data, data architecture, data management, Good Strat, Martyn Jones, Martyn Richard Jones


What is Dark Data?

Dark data, what is it and why all the fuss?

First, I’ll give you the short answer. The right dark data, just like its brother right Big Data, can be monetised – honest, guv! There’s loadsa money to be made from dark data by ‘them that want to’, and as value propositions go, seriously, what could be more attractive?

Let’s take a look at the market.

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes” (IT Glossary – Gartner)

Techopedia describes dark data as being data that is “found in log files and data archives stored within large enterprise class data storage locations. It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.” (Techopedia – Cory Jannsen)

Cory also wrote that “IDC, a research firm, stated that up to 90 percent of big data is dark data.”

In an interesting whitepaper from C2C Systems it was noted that “PST files and ZIP files account for nearly 90% of dark data by IDC Estimates.” and that dark data is “Very simply, all those bits and pieces of data floating around in your environment that aren’t fully accounted for:” (Dark Data, Dark Email – C2C Systems)

Elsewhere, Charles Fiori defined dark data as “data whose existence is either unknown to a firm, known but inaccessible, too costly to access or inaccessible because of compliance concerns.” (Shedding Light on Dark Data – Michael Shashoua)

Not quite the last insight, but in a piece published by Datameer, John Nicholson wrote that “Research firm IDC estimates that 90 percent of digital data is dark.” And went on to state that “This dark data may come in the form of machine or sensor logs” (Shine Light on Dark Data – Joe Nicholson via Datameer)

Finally, Lug Bergman of NGDATA wrote this in a sponsored piece in Wired: “It” – dark data – “is different for each organization, but it is essentially data that is not being used to get a 360 degree view of a customer.

Say what?

Okay, let’s see if we can be a bit more specific about the content of dark data?

Items on the dark data ticket include: Email; Instant messages; documents; Sharepoint content; content of collaboration databases; ZIP files; log files; archived sensor and signal data; archived web content; aged audit trails; operational database backups – full and incremental; roll-back, redo and spooled data files; sunsetted applications (code and documentation); partially developed and then abandoned applications; and, code snippets.

Most importantly, dark data is data that is not actively in use, is underutilised, or is something else. Seriously.

What can you do with it?

So, the conclusion that some have come to is this: there is a vast collection of data in various formats waiting to be monetised.

Personally, the idea that really grabs my attention is the potential ability to do novel forensic research on email. If only to find out what happened in the past.

For example, maybe it would be fascinating to see how significant challenges were identified, flagged and discussed; how strategic responses to those challenges were formulated, chosen and executed; and, how the outcomes of all of that process were reflected in email communications.

I think that this line of work can be very interesting for some people, and that interesting insights may be uncovered, but I would hate to have to put a tangible value on it, if only to avoid adding to the already galactic magnitudes of nonsense and hype surrounding certain data topics.

There are other more mundane uses of dark data.

Imagine that you are just about to embark on a Data Warehouse project (you really are a late adopter aren’t you), and you want establish a base collection of historical data. Where do you get that historical data from?

Right! Operational databases are not characteristically used to store significant amounts of historical reference data and historical transactions beyond a certain time window; there are performance and other reasons for keeping OLTP systems as lean as possible, so, initial loads of historical data is typically recreated in the Data Warehouse from backups, audit trails or logs.

Dark data and data governance

You don’t need a Chief Data Officer in order to be able to catalogue all your data assets. However, it is still good idea to have a reliable inventory of all your business data, including the euphemistically termed Big Data and dark data.

If you have such an inventory, you will know:

What you have, where it is, where it came from, what it is used in, what qualitative or quantitative value it may have, and how it relates to other data (including metadata) and the business.

What needs to be kept, and for how long, and what can be safely discarded, and when.

The risks associated with the retention or loss of that data.

If you don’t have such a catalogue and have never done a data inventory then a full data inventory and audit seems to be your new best friend.

What does it mean?

Simply stated, you may have dark data that has value, or it may be a simple collection of worthless digital nostalgia. But if you don’t know what you have, it may pay to find out what’s there, and if necessary, to let it go.

There is no point in hoarding unneeded and unwanted rubbish data. That is simply not good data management.

Finally a word on all the fuss surrounding dark data.

Failure to monetize when there is value to be obtained from dark data is one thing, claiming that value can be invariably obtained whilst actually not knowing what the data is, or how it could be monetised, is just adding to the mountain of data related ‘nonsense and hype’ doing the rounds these days. Please consider not adding to that mountain.

That’s all folks

British Rail, the national UK rail Company, used to be notorious for the number of delays and cancellations to services, and their reasons for failing to meet their obligations became stranger and stranger.

In winter, it would snow and there would be problems. And people would ask ‘how come you couldn’t deal with the snow this year, we’ve had snow for centuries?’ And back came the answers ‘Yes, Sir, but this year it was the wrong type of snow’. In autumn (the fall), it was ‘the wrong types of leaves, and ‘the wrong type of rain’, and in Summer, the ‘wrong type of sunshine’ and so on and so forth.

I hope this will not be the excuse from the Big Data and dark data pundits and punters when the much-vaunted and ‘almost’ guaranteed monetisation isn’t frequently realised.

‘Of course Big Data gives you big dollar benefits, it was just littered with the wrong type of data’ or ‘you just weren’t trying hard enough’.

Many thanks for reading.

Consider this: Big Data is not Data Warehousing

06 Friday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, Good Strat, hadoop, hdfs, Martyn Jones

≈ 4 Comments

Tags

Big Data, enterprise data warehousing, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.

I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.

Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.

In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.

Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.

Data Warehousing is a strategy driven, business oriented and technology based business process.

We stock Data Warehouses with data that, in one way or another, comes from internal and optional external sources, and from structured and optional unstructured data. The process of getting data from a data source to the target Data Warehouse, involves extraction, scrubbing, transformation and loading, ETL for short.

Data Warehousing’s defining characteristics are:

Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.

Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.

Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.

Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible – and certainly in most cases not cost effective – to determine with an operational database.

Non-Volatile: Non-volatility means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The Data Warehouse is, of course, first loaded with cleaned, integrated and transformed data that originated in the operational databases.

We build Data Warehouses iteratively, a piece or two at a time, and each iteration is primarily a result of business requirements, and not technological considerations.

Each iteration of a Data Warehouse is well bound and understood – small enough to be deliverable in a short iteration, and large enough to be significant.

Conversely, Big Data is characterised as being about:

Massive volumes: so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and

High variety: not only structured data, but also the whole range of digital data, and

High velocity: the speed at which data is generated, transmitted and received.

These are known as the three Vs of Big Data, and they are subject to significant and debilitating contradictions, even amongst the gurus of Big Data (as I have commented elsewhere: Contradictions of Big Data).

From time to time, Big Data pundits slam Data Warehousing for not being able to cope with the Big Data type hacking that they are apparently used to carrying out, but this is a mistake of those who fail to recognise a false Data Warehouse when they see one.

So let’s call these false flag Data Warehouse projects something else, such as Data Doghouses.

“Data Doghouse, meet Pig Data.”

Failed or failing Data Doghouses fail for the same reasons that Big Data projects will frequently fail. Both will almost invariably fail to deliver artefacts on time and to expectations; there will be failures to deliver value or even simply to return a break even in costs versus benefits; and of course, there will be failures to deliver any recognisable insight.

Failure happens in Data Doghousing (and quite possibly in Big Data as well) because there is a lack of coherent and cohesive arguments for embarking on such endeavours in the first place; a lack of real business drivers; and, a lack of sense and sensibility.

There is also a willing tendency to ignore the advice of people who warn against joining in the Big Data hubris. Why do some many ignore the ulterior motives of interested parties who are solely engaged in riding on the faddish Big Data bandwagon to maximise the revenue they can milk off punters? Why do we entertain pundits and charlatans who ‘big up’ Big Data whilst simultaneously cultivating an ignorance of data architecture, data management and business realities?

Some people say that the main difference between Big Data and Data Warehousing is that Big Data is technology, and Data Warehousing is architecture.

Now, whilst I totally respect the views of the father of Data Warehousing himself, I also think that he was being far too kind to the Big Data technology camp. However, of course, that is Bill’s choice.

Let me put it this way, if Oracle gave me the code for Oracle 3, I could add 256 bit support, parallel processing and give it an interface makeover, and it would be 1000 times better than any Big Data technology currently in the market (and that version of Oracle is from about 1983).

Therefore, Data Warehousing has no serious competing paragon. Data Warehousing is a real architecture, it has real process methodologies, it is tried and proven, it has success stories that are no secrets, and these stories include details of data, applications and the names of the companies and people involved, and we can point at tangible benefits realised. It’s clear, it’s simple and it’s transparent.

Just like Big Data, right?

Well, no.

See what I mean?

Therefore, the next time someone says to you that Big Data will replace Data Warehousing or that Data Warehousing is Big Data, or any variations on that sort of ‘stupidity’ theme, you can now tell them to take a hike, in the confidence that you are on the side of reason.

Many thanks for reading.

More perspectives on Big Data

Aligning Big Data: http://www.linkedin.com/pulse/aligning-big-data-martyn-jones

Big Data and the Analytics Data Store: http://www.linkedin.com/pulse/big-data-analytics-store-martyn-jones

A Modern Manager’s Guide to Big Data:http://www.linkedin.com/pulse/managers-guide-big-data-context-martyn-jones

Core Statistics coexisting with Data Warehousing

Accomodating Big Data

And a big thank you to Bill Inmon (the father of Data Warehousing and of DW 2.0)

Aligning Big Data – Chinese

03 Tuesday Mar 2015

Posted by Martyn Jones in Big Data, Consider this

≈ Leave a comment

Tags

Big Data, Good Strat, Martyn Jones


Aligning Big Data – Chinese version is thanks to Optimus Prime – published on http://www.36dsj.com/archives/23692

译文:数据仓库DW 3.0,一个大数据通用的结构框架和模型

大数据36大数据专稿,原文作者:Martyn Jones  本文由1号店-欧显东编译向36大数据投稿,并授权36大数据独家发布。转载必须获得本站及作者的同意,拒绝任何不标明作者及来源的转载!

引言:

为了带来一些类似的简单性,连贯性和完整性的大数据的辩论,我分享一个普遍信息架构和管理的进化模型。

这是对大数据到一个更通用的体系结构框架的调整和布局,架构集成了数据仓库(DW 2.0),商业智能和统计分析。

这个模型目前称为DW 3.0信息提供框架,简称DW 3.0。

回顾

在以前的一篇比较适用的博客名为“Data Made Simple – Even ‘Big Data‘ ”,里面主要有三个粗略类型的数据:企业运营数据;企业过程数据;以及企业信息数据。如下图:

大数据

图1-简要数据模型

简而言之数据的类型可以定义在以下几个:

企业运营数据:这是用于应用程序的数据,支持一个企业的日常运营。

企业过程数据:这是从企业系统是运行的测量和管理收集的数据。

企业信息数据:这主要是数据收集的来自内部和外部的数据源,通常最重要来源是企业运营数据。

这三个底层类型数据是DW 3.0基础。

主体

下面的图展示了DW 3.0总体框架::

大数据

图2 -DW3.0信息框架

在这个图中有三个主要元素:数据来源,核心数据仓库和核心数据。

数据来源:这个元素涵盖所有当前的来源,可用的数据的品种和数量用来支持“挑战识别”,“选择定义”的过程和决策,包括统计分析方法和场景法

数据仓库:这是一个DW 2.0模型的演化路径。它扩展了数据仓库的范式不仅包括非结构化和复杂的数据,而且执行的信息和结果来源于统计分析之外的核心数据仓库的场景。

核心统计:这个元素涵盖了核心的统计能力,特别是但不限于对于进化的数据量,数据速度,数据质量和数据的多样性。

这模块的重点是核心统计。也将提及到三者的关系和合并的效果。

核心统计:

下图关注的核心元素模型:

大数据

图3 – DW3.0核心统计

上图说明了数据流和信息通过数据采集的过程然后到统计分析和结果的集成。

这个模型还引入了分析数据存储的概念。这可以说是最重要的建筑元素。

数据来源

为了简单起见图中有三个显式指定的数据源(当然依赖的企业数据仓库或数据集市也可以作为一个数据源),但是,我在这篇文章中主要有以下三个数据源:复杂的数据;事件数据;基础数据。

复杂数据:这是结构化或高度复杂的结构化数据文件和其他复杂的数据中包含的文物,如多媒体文件。

事件数据:这是企业过程数据的一个方面,通常在一个细粒度的抽象层次。下面是业务流程日志,互联网web活动日志和其他类似事件数据的来源。这些来源所产生的量往往会高于其他数据源,和那些目前与大数据相关的大量的信息通过追踪即使是最轻微的行为数据覆盖生成一样。例如,有人随意浏览网站。

基础数据:这方面的数据包含可能描述为信号类型数据。通过复杂的事件关联和组件分析产生的连续高速流或者高度动荡的的数据。

革命从这里开始

在这里我将稍微突出建筑元素背后的一些指导原则。

没有业务就没有理由这样做:这是什么意思呢?这意味着每一个重大行动,甚至是高度投机活动,必须有一个有形的和可信的业务支持。就和“奥马哈圣人”,和“圣诞老人”的区别一样清楚。

架构决策都是基于一个完整的和深刻的理解需要实现什么和所有可用的选择:例如,拒绝使用高性能的数据库管理产品必须是有原因的,即使这原因是成本。不应该基于技术意见,如“我不喜欢供应商”如果对Hadoop有感觉,然后使用它,如果对Exasol或Oracle或Teradata有感觉,然后使用它们。那么你一定是一个技术不可知论者,但不是一个有教条的技术论者。

统计和非传统的数据源是完全集成到数据仓库未来架构前景::建设更多的公司仓库,无论是通过行动或遗漏,将导致更大的效率低下,更大的误解和更大的风险。

架构必须连贯,连贯,可用和成本效益:如果没有,有什么意义,对吧?

没有技术,技艺或方法是短板:我们需要能够低成本纳入任何相关现有的新兴技术。

减少早期性和减少频繁性:大量的数据,特别是在高速运转的是存在问题的。减少它们的存储容量,即使我们不能在理论上减少的速度是绝对必要的。我将详细说明这一点区别。

减少早期性,减少频繁性

这里我扩大早期的主题数据减少过滤和聚合,我们可能会产生越来越多的大量的数据,但这并不意味着我们需要囤积所有它为了得到一些价值。

简单的来说这就是将初始数据进行ETL(提取和转换)尽可能靠近数据生成器。这是数据库适配器的概念,但它可以逆转的。

让我们看一个场景。

一个公司想要实施一些投机性分析每天的每一分钟收集的许多互联网网站活动日志数据成,他们运行大量的日志文件分布式平台减少数据映射。

然后他们可以分析结果数据。

面临的问题,与许多网站被黑客,设计师,而不是工程师、建筑师和数据库专家开发,是乱堆着极大的和笨拙的文物,如大量的日志文件的详细钝角和新鲜感添加数据。

我们需要确保这个挑战可以移除吗?

我们需要重新考虑网络日志,然后我们需要重新设计它。

我们需要能够进行语法分析日志数据,以减少产生的大量数据占用严重设计和详细数据。

我们需要的双重选择,能够不断地将数据发送给一个事件设备,可以用来降低数据量在一个事件会话的基础上。

如果我们必须使用日志文件,用许多小日志文件减少大量的日志文件和更多的日志周期减少几个日志周期。我们还必须最大化并行日志的好处。

所以现在,我们得到了日志数据的使用可以通过日志文件、日志文件由一个事件设备(如工具包的一部分分析数据收集适配器)或发送的设备通过消息传递信号点而来。

一旦数据已经传输(传统文件传输/共享或消息)我们可以进入下一个步骤:ET(A)L -提取、转换、分析和负载。

日志文件,我们通常采用ETL(A)但是当然我们不需要ETL中的E即提取,因为这是直接连接。

再次减少ET(AL)是另一种形式的机制,这就是为什么分析方面包括确保得到的数据通过需要的数据,而没有认可价值的垃圾和噪音,会尽早并且经常清理。

分析数据存储

分析数据存储(可以是一个分布式数据存储在某个云)支持统计分析的数据需求。这里的数据组织、结构、集成和丰富的持续波动,偶尔需要统计学家和科学家关注数据挖掘。分析数据存储中的数据可以累计或完全刷新。它可以有一个短寿命或有显著高寿命。

分析数据存储的核心是分析数据。不仅可以用于提供数据统计分析过程,但它也可以用来提供长期持久存储分析结果和场景,和未来的一些分析,因此具有“回馈”的能力。

分析数据存储中的数据和信息也可以使用、来源于数据仓库中存储的数据,它也可能受益于拥有自己的专用数据集市专门为这个目的而设计的。

在分析数据存储的统计分析的结果也可能导致反馈用于调优数据,过滤和浓缩的规则,无论是智能数据分析、复杂事件和歧视适配器或ET(AL)工作。

总结

这一定是非常短暂的对于目前的DW 3.0的标签

模型不寻求定义统计或统计分析是如何应用的,已经做了足够多,但如何适应统计在一个扩展的DW 2.0架构,和几乎不需要想出反动和不合身的问题解决方案,可以解决的更好、更有效的方法通过明智、健全的工程原则和适当的明智的应用方法,技术和技巧。

原文:Aligning Big Data

Contradictions of Big Data

01 Sunday Mar 2015

Posted by Martyn Jones in Ask Martyn, Big Data, Consider this

≈ 1 Comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


What we’ve been told

We’ve been told that Big Data is the greatest thing since sliced bread, and that its major characteristics are massive volumes (so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it), high variety (not only structured data, but also the whole range of digital data), and high velocity (the speed at which data is generated and transmitted). Also, from time to time, much to the chagrin of some Big Data disciples, a whole slew of new identifying Vs are produced, touted and then dismissed (check out my LinkedIn Pulse article on Big Data and the Vs).

So, beware. Things in Big Data may not be as they may seem.

It’s not about big

I have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Can we call that ‘strike one’ for Big Data Vs?

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational. It is also claimed that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is simply wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So it does not add variety, even though it is not structured in the ways that some people might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.

Strike two! Third time lucky?

It’s not even about velocity

So, if we accept that Big Data isn’t really about the data volumes or data variety that leaves us with velocity, right? Well no, because if it isn’t about record breaking VLDBor significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue. The fact that some software vendors and IT service suppliers set up this ‘straw man’ argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

Strike three, and counting.

It’s not about the manageability of Big Data

We have been told and time again that the major difference between a data scientist and professional statistician is that the ‘scientists’ know how to cope very well with massive volumes, varieties and velocities of data. Now it turns out that this is also questionable.

According to Bob Violino writing in Information Management (Messy Big Data Overwhelms Data Scientists – 20 February 2015) “Data scientists see messy, disorganized data as a major hurdle preventing them from doing what they find most interesting in their jobs”. So, when it comes to data quality and structure the ‘scientists’ don’t really have an advantage over professional statisticians.

Last year Thomas C. Redman writing in the Harvard Business Review (Data’s Credibility Problem) noted that when Big Data is unreliable “managers quickly lose faith” and “and fall back on their intuition to make decisions, steer their companies, and implement strategy” and when this happens there is a propensity to reject potentially “important, counterintuitive implications that emerge from big data analyses.”

Strike four?

The new analytics aren’t new

Data science and Big Data analytics are the new kids on the block, aren’t they?

Well, here are some real life scenarios.

A major banking equipment supplier: A lot of banking equipment is hybrid analogic-digital, a simple example of this would be a photo copier or a physical document processing device. One major supplier decided to incorporate the capture of sensor data produced by their devices to predict failure and problems. Predictive preventive maintenance rules are created and corroborated using the data generated by sensors on each customer device, and these rules then get incorporated into the devices logic.

A major IT vendor: What happens when you create an intersection and convergence between technologies, techniques and method from areas of mainstream IT, data architecture and management, statistics (quantitative and qualitative analytics) and data visualisation, artificial intelligence/machine learning and knowledge management? This is precisely what one of the main European IT vendors did, and the idea proved to be quite attractive to customers, prospects and investors.

A major integrated circuit supplier: The testing of ICs at the ‘fabs’ (manufacturing plants) generates serious amount of data. This data is used to detect errors in the IC manufacturing process, it is captured and analysed in as near real-time as possible, which is necessary due to the costly nature of over-running the production of faulty ICs. To get around this problem the company uses a combination of fast data capture, transformation and loading of data into a data analytics area to ensure early and precise problem detection.

All Big Data Analytics success stories?

The first happened in 1989, the second in 1993 and the third in 2001. Yes, Big Data and Big Data analytics are sort of newish.

Strike five.

The science is frequently not very scientific

What is science?

According to Vasant Dhar of the Stern School of Business (Data Science and Prediction), Jeff Leek (The key word in “Data Science” is not Data, it is Science), and repeated on Wikipedia, “In general terms, data science is the extraction of knowledge from data”. Well, excuse me if I beg to differ. I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research.

If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.

As Hazelcast VP Miko Matsumura suggested in Data Science is Dead “… put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data” and ” Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.”

Strike six.

And the value is questionable

DATA: “Data is a super-class of a modern representation of an arcane symbology.” – Anon

If I had a dollar for every time I heard someone claim that data has intrinsic positive value then I would be as wealthy as Warren Buffet.

If I have said it once, I have said it a hundred time. In order for data to be more than an operational necessity it requires context.

Providing valid data with valid context turns that data into information.

Data can be relevant and data can be irrelevant. That relevance or irrelevance of data may be permanent or temporary, continuous or episodic, qualitative or quantitative.

Some data is meaningless, and there are cases whereby nobody can remember why it was collected or what purpose it serves.

Taking all this into account we can ask the deadly pragmatic question: what value does this data have? Which is sometimes answered with a pertinent ‘no value whatsoever’.

Strike seven.

So what is it really about?

It is said that Big Data is changing the world, but for all intents and purposes, and shamed by previous Big Data excesses, some people are rapidly changing the definitions and parameters of Big Data, and to position it as being more tangible and down-to-earth, whilst moving it away from its position as an overhyped and dead-ended liability.

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. So, let’s drop the Big Data moniker, and embrace the fact that data is data, and long live ‘All Data’, yes, all digital data. Let’s consider all data and for what it’s worth to the business, and not for what some chatterers reckon its value is – having as they do, little or no insight into the businesses to which they refer, or of the data in that these businesses possess.

So, when push comes to shove, is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?

Thanks very much for reading.

Big Data in Question – Again

01 Sunday Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this

≈ Leave a comment

Tags

All Data, Big Data, data management, Good Strat, good strat blog, Good Strategy, Martyn Jones, Martyn Richard Jones


Big Data is now an inhospitable and unhealthy land inhabited by those who, through accident or design, deceive naïve and sentimental bystanders and those who are willingly mislead.

When all of this Big Data malarkey started it was sort of funny, humorous and occasional witty, especially in the affected, bizarre and the frequently uninhibited ways that freshly-minted self-appointed gurus and experts would “big it up”

Doctor Freud would have had a field day with all of that, being as it was, and for that matter still is, a postmodern mishmash of Riefenstahl, Freddy Mercury and Monty Python on steroids. However, after that extended, operatic and high-camp hiatus it all went downhill.

The Big Data scene is fast becoming an outrageous and brash festival of deception, disinformation and obliviousness. Which is a pity, because it does the industry no good whatsoever.

It is telling that Big Data evangelists, gurus and assorted sycophants cannot even define Big Data adequately, never mind discuss (or for that matter, point at) tangible success stories, without falling into contradictions on all of the key defining characteristics of volume, variety and velocity, and resorting to crude debating devices to avoid or finesse the concerns and the questions.

Almost every morning I check out the industry news, and almost invariably, it comes with new mind-boggling examples of Big Data nonsense.

However, it isn’t always nonsense for nonsense’s sake, there are agendas, there are rational explanations why Big Data has become at the same time, one of the most hyped up fads in the history of IT, and one that its supporters find so difficult to actually explain and justify, in any reasonable sort of way.

Therefore, when it comes to Big Data, beyond the surfeit of platitudes, clichés, bluff and bluster, the only thing in play are the interests of industry, the patrons, the courtesans and their entourage of the innocent and the beguiled.

One of the biggest deceptions in Big Data is in the misleadingly named ‘success stories’. The thing is that most of these success stories that I have ever read have been:

  • So vague that it’s difficult to know how success is being defined never mind reached.
  • So secretive and obtuse is the avoidance of naming names, locations and other relevant Big Data references that it’s impossible to corroborate if these claims are actually true or not. Disclaimer: I have worked for some of the biggest IT vendors, and in senior roles, and I know what is behind comments such as “the Big Data project is a success, although the client name and project are confidential” and “it’s delivering such major competitive advantages that we are obliged to keep it under wraps”.
  • Stories stolen from elsewhere, such as from Data Warehousing, Business Intelligence, VLDB or Business Application projects.
  • Borderline fantasies and badly contrived technology fan fiction.

However, it doesn’t stop there.

One of the clearest examples of the questionable nature of Big Data evangelism is when it is used to piggyback Big Data hype on simple, tangible and immediately recognisable artefacts or applications that have little in common with Big Data.

This is an extreme illustration, but it works like this: “iPhones are commercially successful, iPhones are part of Big Data, and therefore Big Data is commercially successful.”

As if the mere conjuring up of association, affinity and proximity will convince people of the great and growing value of Big Data.

What I am also referring to are publicity pieces that may as well have been titled:

  • Smith, Galbraith, Mies, Keynes, Homer SImpson and the economic justification of Big Data
  • Lovelace, Babbage, von Neumann, Eckert, Davies, Codd, Knuth, Naur and the technological underpinnings of Big Data
  • Einstein, Freud, Edison, Faraday, Recorde and the intellectual structure of Big Data
  • Socrates, Kant, Hegel, Marx , Adorno and the philosophical correctness of Big Data
  • Great quotes about Big Data, from the Cambrian era to the postmodern époque
  • Great jokes about Big Data, from Mel Brooks to Steve Martin
  • Sportspeople and Big Data, from Lottie Dodd and Babe Ruth to Rafa Nadal and CR7
  • Industry support of Big Data, from Henry Ford to Neutron Jack

Do you recognise similarities?

It’s no big deal, just the use of unreliable, misleading and inappropriate fallacies, dressed up as cute, plausible and accessible collateral. People may think that such things are clever and witty, but they aren’t, it’s just misleading.

Let’s continue with something simple.

Evasion is, in ethics, an act that deceives by stating a true statement that is immaterial or leads to a false deduction. For example, citing events, persons or anecdotes from the history of IT to justify the supposed or imaginary value of Big Data. This is close to the notion of a non sequitur, which of course is an argument, the conclusions from which do not follow from its premise. It falls short of being full-on sophistry, purely because the simplistic, puerile and superficial arguments put forward in favour of Big Data do not match those of the true sophist who seeks to reason with clever but fallacious and deceptive arguments. Too many of the Big Data arguments are fallacious and deceptive, but no one, equipped with a reasonable capacity for critical thinking, should take such ‘arguments’ as valid.

Hold this thought: Big Data hype is a viper’s nest of logical fallacies, white lies and disinformation.

Just when I think things could not get any weirder, they do, and Big Data ceiling of hyperbole rises even higher, up to the rarer atmosphere of extreme tendentiousness.

There is a growing mass of Big Data hoop-la, hyperbole and flim flam that exceeds all previously bounds of overstatement, solecism and confabulation. This is where the real volumes, varieties and velocities are in Big Data; in hokie.

We live, as Oscar Wilde said in his day, in and age of surfaces. Yes, superficiality, puerility and short-termism are the competing orders of the day. However, I am still amazed – and maybe wrongly so – by what ostensibly professional, experienced and knowledgeable people are willing, able and prepared to accept, especially when it comes to Big Data flim flam sauce.

Here are some examples of the nonsense about Big Data that is taken as gospel by ‘adults’:

Data Warehousing is part of Big Data: No comment.

Big Data will replace Enterprise Data Warehousing: People can’t even explain the features and benefits of Big Data. I try it make it as easy as possible, ‘if you can’t say it, point to it’. But, seriously, people can’t even relate tangible and credible Big Data success stories, never mind show how it will replace Enterprise Data Warehousing, whether that’s the Inmon or Kimball flavour, take your pick.

Everyone and every organisation can benefit from Big Data: If people can’t explain this, and they don’t in terms of tangible benefits, then the claim should remain questionable.

Data Scientists will replace Statisticians: Why is that so? It is claimed that Data Scientists are uniquely equipped to handle massive volumes, varieties and velocities of data – well, as it turns out, this isn’t certain either.

Big Data is in its infancy: I think we may be confusing infancy with lack of real traction, and of time and place utility.

You cannot be serious: Just what are people talking about here? I have read vague, naïve and ill-informed pieces about data management, data architecture, data warehousing, reporting, business intelligence and a plethora of etcetera that have been passed off as observations and commentary on Big Data. So, what makes people recycle hackneyed, misleading and badly conceptualised ‘content’?

In the commentary on one of Bernard Marr’s pieces on LinkedIn (a professional networking site) I observed that no one can adequately explain what Big Data is without falling into contradictions and fancies, and no one seems to be capable or willing to provide tangible success stories.

Bernard responded to this comment by pointing out “the reason for that is that Big Data means different things to different people.”

Fair enough. It’s an explanation.

That said, I have always had more than a tenuous dislike of postmodern thinking, in fact most things ‘postmodern’. Call me old fashioned, jaded or cynical, but to me, the idea that everything can mean anything is an aberration that I prefer to leave to others.

I am at a loss to explain why so many reasonable people are willing to embrace the hype surrounding Big Data and Big Data Analytics, including the attendant surfeit of nonsense, incongruences and contradictions, and from my perspective, it defies reason and good sense.

Therefore, I will just end again with a fabulous quote from Ben Goldacre:

“You cannot reason people out of a position that they did not reason themselves into”.

Many thanks for reading.

Contradictions of Big Data – Short

01 Sunday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

≈ Leave a comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


Please note: This is an edited version of a previous piece with a similar name, but focusing solely on the three main Vs of Big Data.

What we’ve been told

We’ve been told that business Big Data is the greatest thing since sliced bread, and that its major characteristics are:

  • massive volumes – so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and
  • high variety – not only structured data, but also the whole range of digital data, and
  • high velocity – the speed at which data is generated, transmitted and received

Which is a simple and straightforward means of classification. Big Data is about massive volumes, high variety and high velocity. Right?

It’s not about big

I have never bought into the idea that more data is necessarily better data, or that it provides better focus or leads to increased insight, in fact I have been quite vocal with my contrarian opinion, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Over at Fierce Big Data it was Pam Baker who stated that “the term big data is unfortunate because it’s really not about the size of the data”. (Big data is not about petabytes, but complex computing).

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Well, apparently Big Data isn’t about “massive volumes” of data.

Strike 1!

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational.

It is also said that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So there it does not add variety, even though it is not structured in the ways that some someone might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, juvenile twittering, home videos of cute kittens, or the complete works of William Shakespeare. Almost all business analysis (whether done by a professional statistician or a data scientist) will continue to be carried out using structured data obtained primarily from internal operational systems and external structured data providers.

Variety, Sir? No problem.

Strike two!

It’s not even about velocity

So, if we accept that Big Data isn’t really about the massive data volumes or high data variety then that leaves us with velocity. Because if it isn’t about record breaking VLDB or significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue.

Even in some extreme circumstances, one can explore the suggestion that data sampling can remove issues with data volume as well as velocity.

However, the fact that some software vendors and IT service suppliers set up this‘straw man’ velocity argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

So, is it really about velocity?

Strike three!

So what is it really about?

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. Let’s dump the Big Data moniker, and the 3 Vs along with it, and embrace the fact that data is data, there will always be more of it.

So, let’s consider ‘all data’ and principally for its time and place utility.

If there is something that you are not sure about or have questions with then please leave a comment below or email me.

Thanks very much for reading.

Consider this: Big Data and the Pot of Tea

17 Tuesday Feb 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones, Strategy

≈ Leave a comment

Tags

Analytics, Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones


To begin at the beginning

Hold this thought: Big Data is King.

Is there just nothing that Big Data isn’t capable of fixing? From terrorism, world hunger, Ebola, HIV, fraud, money laundering and hiring the ‘right’ people through to winning the lottery, curing hangovers, arranging entrapment and finding the love of your life. Big Data is King. Continue reading →

← Older posts
Newer posts →

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 637 other subscribers

Top posts

  • X Is Dying In Europe: Here's Why
  • Meet the Euro Press - 2026/01/04
  • An Open Letter to Doctor Azi Dagan
  • Nine Absurd AI Use Cases We Don’t Need
  • Weaving the Dragon's Data: A Welsh-Inspired Tale for Enterprise Architects in the New Year – 2026/01/01
  • A Question of Taste: The Epic Dilema of Writing Styles – 2026/01/02
  • Top Countries Known for Arrogance and Ignorance
  • The American Basket Case
  • Mobile Device Revolution: Five Trends for 2026
  • BREXIT: Theresa May's Catch-22

Recent Comments

Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Unknown's avatarThe Case for a Globa… on REVEALING WEALTH: USING BIG DA…
Follow GOOD STRATEGY on WordPress.com

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Names in the cloud

All Data Ask Martyn awareness Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy Inform, educate and entertain. IT strategy Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

Hours & Info

Spain
+33 767 120 160
martyn.jones@martyn.es
Lunch: 13:30pm - 14:30pm
Dinner: M-Th 20:00pm - 21:00pm, Fri-Sat:21:00pm - 22:00pm

The Good Strat Archives

  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • March 2023
  • January 2022
  • December 2021
  • November 2021
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • July 2019
  • June 2019
  • May 2019
  • December 2018
  • January 2018
  • December 2017
  • October 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014

The Stats

  • 112,498 hits

Recent posts

  • The Wisdom of Three – 2026/01/03 January 9, 2026
  • Independent Wales – 2026/01/05 January 9, 2026
  • Meet the Euro Press – 2026/01/04 January 9, 2026
  • Nine Absurd AI Use Cases We Don’t Need January 9, 2026
  • An Open Letter to Doctor Azi Dagan January 9, 2026
  • A Question of Taste: The Epic Dilema of Writing Styles – 2026/01/02 January 9, 2026
  • Weaving the Dragon’s Data: A Welsh-Inspired Tale for Enterprise Architects in the New Year – 2026/01/01 January 8, 2026
  • Debunking the Myth: Zionism vs Judaism Explained December 21, 2025
  • Future Data Trends: Insights for 2026 Leadership December 21, 2025
  • העידן החדש של מחסום הנתונים: מהפכה טכנולוגית December 21, 2025

Recent Comments

Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn Jones's avatarMartyn Jones on The BBC in Crisis: Navigating…
Martyn de Tours's avatarMartyn de Tours on The Perpetual Victim: How Prof…
Tiffany's avatarTiffany on Consider this: Data Made …
Unknown's avatarThe Case for a Globa… on REVEALING WEALTH: USING BIG DA…

Archives

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Log in

Hours & Info

Martyn Richard Jones
Madrid, Spain
+34 692 376 698
martyn.jones@martyn.es
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com

Top Good Strat Posts & Pages

  • Innovative Strategies for Modern Governance
  • X Is Dying In Europe: Here's Why
  • Meet the Euro Press - 2026/01/04
  • An Open Letter to Doctor Azi Dagan
  • Nine Absurd AI Use Cases We Don’t Need
  • Weaving the Dragon's Data: A Welsh-Inspired Tale for Enterprise Architects in the New Year – 2026/01/01
  • A Question of Taste: The Epic Dilema of Writing Styles – 2026/01/02
  • Top Countries Known for Arrogance and Ignorance
  • The American Basket Case
  • Mobile Device Revolution: Five Trends for 2026

Good strat tag cloud

1 2 3 4 5 AI All Data Analytics Artificial Intelligence Behavioural Economics BI Big Data bigdata blog books Business business analysis Business Enablement business intelligence Business Management business strategy chatgpt cloud Consider this data data integration data management data science Data Warehouse Demagogism digital-marketing Dogma Donald Trump enterprise data warehousing espanol EU fe gaza goodstart good start Good Strat goodstrat Good Strategy hamas history ia information Information and Technology information management Information Technology israel IT Strategy jesus knowledge leadership life llm machine learning Marketing Martyn Jones Martyn Richard Jones News Offshoring Organisational Autism palestine Philosophy poesia Politics Russia Spain statistics Strategy technology trump writing

Categories

  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • Culture
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • films
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • Inform, educate and entertain.
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • Spain
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

  • Subscribe Subscribed
    • GOOD STRATEGY
    • Join 135 other subscribers
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
    To find out more, including how to control cookies, see here: Cookie Policy