• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn

GOOD STRATEGY

~ for every significant challenge

GOOD STRATEGY

Monthly Archives: March 2015

Aligning Big Data – Chinese

03 Tuesday Mar 2015

Posted by Martyn Jones in Big Data, Consider this

≈ Leave a comment

Tags

Big Data, Good Strat, Martyn Jones

Aligning Big Data – Chinese version is thanks to Optimus Prime – published on http://www.36dsj.com/archives/23692

译文:数据仓库DW 3.0,一个大数据通用的结构框架和模型

大数据36大数据专稿,原文作者:Martyn Jones  本文由1号店-欧显东编译向36大数据投稿,并授权36大数据独家发布。转载必须获得本站及作者的同意,拒绝任何不标明作者及来源的转载!

引言:

为了带来一些类似的简单性,连贯性和完整性的大数据的辩论,我分享一个普遍信息架构和管理的进化模型。

这是对大数据到一个更通用的体系结构框架的调整和布局,架构集成了数据仓库(DW 2.0),商业智能和统计分析。

这个模型目前称为DW 3.0信息提供框架,简称DW 3.0。

回顾

在以前的一篇比较适用的博客名为“Data Made Simple – Even ‘Big Data‘ ”,里面主要有三个粗略类型的数据:企业运营数据;企业过程数据;以及企业信息数据。如下图:

大数据

图1-简要数据模型

简而言之数据的类型可以定义在以下几个:

企业运营数据:这是用于应用程序的数据,支持一个企业的日常运营。

企业过程数据:这是从企业系统是运行的测量和管理收集的数据。

企业信息数据:这主要是数据收集的来自内部和外部的数据源,通常最重要来源是企业运营数据。

这三个底层类型数据是DW 3.0基础。

主体

下面的图展示了DW 3.0总体框架::

大数据

图2 -DW3.0信息框架

在这个图中有三个主要元素:数据来源,核心数据仓库和核心数据。

数据来源:这个元素涵盖所有当前的来源,可用的数据的品种和数量用来支持“挑战识别”,“选择定义”的过程和决策,包括统计分析方法和场景法

数据仓库:这是一个DW 2.0模型的演化路径。它扩展了数据仓库的范式不仅包括非结构化和复杂的数据,而且执行的信息和结果来源于统计分析之外的核心数据仓库的场景。

核心统计:这个元素涵盖了核心的统计能力,特别是但不限于对于进化的数据量,数据速度,数据质量和数据的多样性。

这模块的重点是核心统计。也将提及到三者的关系和合并的效果。

核心统计:

下图关注的核心元素模型:

大数据

图3 – DW3.0核心统计

上图说明了数据流和信息通过数据采集的过程然后到统计分析和结果的集成。

这个模型还引入了分析数据存储的概念。这可以说是最重要的建筑元素。

数据来源

为了简单起见图中有三个显式指定的数据源(当然依赖的企业数据仓库或数据集市也可以作为一个数据源),但是,我在这篇文章中主要有以下三个数据源:复杂的数据;事件数据;基础数据。

复杂数据:这是结构化或高度复杂的结构化数据文件和其他复杂的数据中包含的文物,如多媒体文件。

事件数据:这是企业过程数据的一个方面,通常在一个细粒度的抽象层次。下面是业务流程日志,互联网web活动日志和其他类似事件数据的来源。这些来源所产生的量往往会高于其他数据源,和那些目前与大数据相关的大量的信息通过追踪即使是最轻微的行为数据覆盖生成一样。例如,有人随意浏览网站。

基础数据:这方面的数据包含可能描述为信号类型数据。通过复杂的事件关联和组件分析产生的连续高速流或者高度动荡的的数据。

革命从这里开始

在这里我将稍微突出建筑元素背后的一些指导原则。

没有业务就没有理由这样做:这是什么意思呢?这意味着每一个重大行动,甚至是高度投机活动,必须有一个有形的和可信的业务支持。就和“奥马哈圣人”,和“圣诞老人”的区别一样清楚。

架构决策都是基于一个完整的和深刻的理解需要实现什么和所有可用的选择:例如,拒绝使用高性能的数据库管理产品必须是有原因的,即使这原因是成本。不应该基于技术意见,如“我不喜欢供应商”如果对Hadoop有感觉,然后使用它,如果对Exasol或Oracle或Teradata有感觉,然后使用它们。那么你一定是一个技术不可知论者,但不是一个有教条的技术论者。

统计和非传统的数据源是完全集成到数据仓库未来架构前景::建设更多的公司仓库,无论是通过行动或遗漏,将导致更大的效率低下,更大的误解和更大的风险。

架构必须连贯,连贯,可用和成本效益:如果没有,有什么意义,对吧?

没有技术,技艺或方法是短板:我们需要能够低成本纳入任何相关现有的新兴技术。

减少早期性和减少频繁性:大量的数据,特别是在高速运转的是存在问题的。减少它们的存储容量,即使我们不能在理论上减少的速度是绝对必要的。我将详细说明这一点区别。

减少早期性,减少频繁性

这里我扩大早期的主题数据减少过滤和聚合,我们可能会产生越来越多的大量的数据,但这并不意味着我们需要囤积所有它为了得到一些价值。

简单的来说这就是将初始数据进行ETL(提取和转换)尽可能靠近数据生成器。这是数据库适配器的概念,但它可以逆转的。

让我们看一个场景。

一个公司想要实施一些投机性分析每天的每一分钟收集的许多互联网网站活动日志数据成,他们运行大量的日志文件分布式平台减少数据映射。

然后他们可以分析结果数据。

面临的问题,与许多网站被黑客,设计师,而不是工程师、建筑师和数据库专家开发,是乱堆着极大的和笨拙的文物,如大量的日志文件的详细钝角和新鲜感添加数据。

我们需要确保这个挑战可以移除吗?

我们需要重新考虑网络日志,然后我们需要重新设计它。

我们需要能够进行语法分析日志数据,以减少产生的大量数据占用严重设计和详细数据。

我们需要的双重选择,能够不断地将数据发送给一个事件设备,可以用来降低数据量在一个事件会话的基础上。

如果我们必须使用日志文件,用许多小日志文件减少大量的日志文件和更多的日志周期减少几个日志周期。我们还必须最大化并行日志的好处。

所以现在,我们得到了日志数据的使用可以通过日志文件、日志文件由一个事件设备(如工具包的一部分分析数据收集适配器)或发送的设备通过消息传递信号点而来。

一旦数据已经传输(传统文件传输/共享或消息)我们可以进入下一个步骤:ET(A)L -提取、转换、分析和负载。

日志文件,我们通常采用ETL(A)但是当然我们不需要ETL中的E即提取,因为这是直接连接。

再次减少ET(AL)是另一种形式的机制,这就是为什么分析方面包括确保得到的数据通过需要的数据,而没有认可价值的垃圾和噪音,会尽早并且经常清理。

分析数据存储

分析数据存储(可以是一个分布式数据存储在某个云)支持统计分析的数据需求。这里的数据组织、结构、集成和丰富的持续波动,偶尔需要统计学家和科学家关注数据挖掘。分析数据存储中的数据可以累计或完全刷新。它可以有一个短寿命或有显著高寿命。

分析数据存储的核心是分析数据。不仅可以用于提供数据统计分析过程,但它也可以用来提供长期持久存储分析结果和场景,和未来的一些分析,因此具有“回馈”的能力。

分析数据存储中的数据和信息也可以使用、来源于数据仓库中存储的数据,它也可能受益于拥有自己的专用数据集市专门为这个目的而设计的。

在分析数据存储的统计分析的结果也可能导致反馈用于调优数据,过滤和浓缩的规则,无论是智能数据分析、复杂事件和歧视适配器或ET(AL)工作。

总结

这一定是非常短暂的对于目前的DW 3.0的标签

模型不寻求定义统计或统计分析是如何应用的,已经做了足够多,但如何适应统计在一个扩展的DW 2.0架构,和几乎不需要想出反动和不合身的问题解决方案,可以解决的更好、更有效的方法通过明智、健全的工程原则和适当的明智的应用方法,技术和技巧。

原文:Aligning Big Data

Contradictions of Big Data

01 Sunday Mar 2015

Posted by Martyn Jones in Ask Martyn, Big Data, Consider this

≈ 1 Comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

What we’ve been told

We’ve been told that Big Data is the greatest thing since sliced bread, and that its major characteristics are massive volumes (so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it), high variety (not only structured data, but also the whole range of digital data), and high velocity (the speed at which data is generated and transmitted). Also, from time to time, much to the chagrin of some Big Data disciples, a whole slew of new identifying Vs are produced, touted and then dismissed (check out my LinkedIn Pulse article on Big Data and the Vs).

So, beware. Things in Big Data may not be as they may seem.

It’s not about big

I have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Can we call that ‘strike one’ for Big Data Vs?

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational. It is also claimed that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is simply wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So it does not add variety, even though it is not structured in the ways that some people might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.

Strike two! Third time lucky?

It’s not even about velocity

So, if we accept that Big Data isn’t really about the data volumes or data variety that leaves us with velocity, right? Well no, because if it isn’t about record breaking VLDBor significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue. The fact that some software vendors and IT service suppliers set up this ‘straw man’ argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

Strike three, and counting.

It’s not about the manageability of Big Data

We have been told and time again that the major difference between a data scientist and professional statistician is that the ‘scientists’ know how to cope very well with massive volumes, varieties and velocities of data. Now it turns out that this is also questionable.

According to Bob Violino writing in Information Management (Messy Big Data Overwhelms Data Scientists – 20 February 2015) “Data scientists see messy, disorganized data as a major hurdle preventing them from doing what they find most interesting in their jobs”. So, when it comes to data quality and structure the ‘scientists’ don’t really have an advantage over professional statisticians.

Last year Thomas C. Redman writing in the Harvard Business Review (Data’s Credibility Problem) noted that when Big Data is unreliable “managers quickly lose faith” and “and fall back on their intuition to make decisions, steer their companies, and implement strategy” and when this happens there is a propensity to reject potentially “important, counterintuitive implications that emerge from big data analyses.”

Strike four?

The new analytics aren’t new

Data science and Big Data analytics are the new kids on the block, aren’t they?

Well, here are some real life scenarios.

A major banking equipment supplier: A lot of banking equipment is hybrid analogic-digital, a simple example of this would be a photo copier or a physical document processing device. One major supplier decided to incorporate the capture of sensor data produced by their devices to predict failure and problems. Predictive preventive maintenance rules are created and corroborated using the data generated by sensors on each customer device, and these rules then get incorporated into the devices logic.

A major IT vendor: What happens when you create an intersection and convergence between technologies, techniques and method from areas of mainstream IT, data architecture and management, statistics (quantitative and qualitative analytics) and data visualisation, artificial intelligence/machine learning and knowledge management? This is precisely what one of the main European IT vendors did, and the idea proved to be quite attractive to customers, prospects and investors.

A major integrated circuit supplier: The testing of ICs at the ‘fabs’ (manufacturing plants) generates serious amount of data. This data is used to detect errors in the IC manufacturing process, it is captured and analysed in as near real-time as possible, which is necessary due to the costly nature of over-running the production of faulty ICs. To get around this problem the company uses a combination of fast data capture, transformation and loading of data into a data analytics area to ensure early and precise problem detection.

All Big Data Analytics success stories?

The first happened in 1989, the second in 1993 and the third in 2001. Yes, Big Data and Big Data analytics are sort of newish.

Strike five.

The science is frequently not very scientific

What is science?

According to Vasant Dhar of the Stern School of Business (Data Science and Prediction), Jeff Leek (The key word in “Data Science” is not Data, it is Science), and repeated on Wikipedia, “In general terms, data science is the extraction of knowledge from data”. Well, excuse me if I beg to differ. I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research.

If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.

As Hazelcast VP Miko Matsumura suggested in Data Science is Dead “… put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data” and ” Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.”

Strike six.

And the value is questionable

DATA: “Data is a super-class of a modern representation of an arcane symbology.” – Anon

If I had a dollar for every time I heard someone claim that data has intrinsic positive value then I would be as wealthy as Warren Buffet.

If I have said it once, I have said it a hundred time. In order for data to be more than an operational necessity it requires context.

Providing valid data with valid context turns that data into information.

Data can be relevant and data can be irrelevant. That relevance or irrelevance of data may be permanent or temporary, continuous or episodic, qualitative or quantitative.

Some data is meaningless, and there are cases whereby nobody can remember why it was collected or what purpose it serves.

Taking all this into account we can ask the deadly pragmatic question: what value does this data have? Which is sometimes answered with a pertinent ‘no value whatsoever’.

Strike seven.

So what is it really about?

It is said that Big Data is changing the world, but for all intents and purposes, and shamed by previous Big Data excesses, some people are rapidly changing the definitions and parameters of Big Data, and to position it as being more tangible and down-to-earth, whilst moving it away from its position as an overhyped and dead-ended liability.

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. So, let’s drop the Big Data moniker, and embrace the fact that data is data, and long live ‘All Data’, yes, all digital data. Let’s consider all data and for what it’s worth to the business, and not for what some chatterers reckon its value is – having as they do, little or no insight into the businesses to which they refer, or of the data in that these businesses possess.

So, when push comes to shove, is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?

Thanks very much for reading.

Being dishonest about honesty

01 Sunday Mar 2015

Posted by Martyn Jones in Consider this

≈ Leave a comment

Tags

dishonesty, ethical leadership, honesty, hypocrisy, personal development, psychology

BEING DISHONEST ABOUT HONESTY

I never touched a gun in my life. That and that alone forever doomed me to middle management.

Vincent “Vinnie” Antonelli

From: My Blue Heaven

Okay. For the record, I never lie; just ask my cousins, Rocky 1, 2 and 3.

Ooops… yeah, as I was saying…

Now hold this thought: Have you ever told someone “you are the most beautiful person in the world”?

Put it this way, every so often, there are blog pieces, especially on LinkedIn, exhorting people to be honest, always honest and for their own good, and frankly, to me, this is the sordid and despicable height of dishonesty.

Let me state this up front, in my view, honesty is always the best policy, especially if you have a bad memory. Honesty in the workplace makes a lot of sense. So try to be honest in your chosen or imposed profession or work activity, and enthusiastically so.

Now should you also temper this view with the realities of working life in market oriented or capitalist societies, or indeed, you may be someone who has not had such good fortune, but the question still stands. Can you be pragmatic and still maintain a moral compass?

You work in a bank and notice what you believe to be irregularities in the accounting process; you denounce those irregularities, because you are honest, right?

Your best friend runs a satellite dish company and you suspect that they may be doing work off their books to avoid paying taxes; you report them to the URS, right? Because you are honest.

Your 89-year-old pot-smoking neighbour, a onetime best friend of your parents, is growing marihuana in her bathroom; you report her to the cops, right? Because you are honest.

The caretaker at the school who claimed to have been mortally wounded in the Great Klingon Wars; and you call them out on their lies. They may be sacked because of your honesty, but that’s okay. Right?

Here’s one close to my heart; really, truly, honestly… Oops, now how did that happen? This is not close to my heart, this is business. So, you are working for a company that is trying to get into the big time and the fast bucks with Big Data. Your peers claim they are the experts, when actually they are not, they claim that the software is new and without equal, even though you know it is based on really old technology in a fancy new package. Do you denounce them for their lies, deception and guile? You know, for the sake of honesty.

A salesperson guilds the lily in a presentation to a client. Even though the client isn’t phased by this, because they know the game, and they aren’t entirely candid themselves, you still report them for lying, right? Because you are honest, and they are very, very naughty people.

The corporation you are working for, that like many others, continually reinvents their history and their product and service line; they are altering the facts to suit the market. You denounce them as well right? Because you are honest, and they are so wrong.

Well, no.

When it comes to the truth, there are many sanctimonious, two-faced puritanical hypocrites out there.

I expect both people and businesses, especially businesses, to gild the lily, to stretch the point, to exaggerate, to invent histories, anecdotes, success stories or to spin failures as successes, to sex up, back fill, hype up and to offer flim flam as fact.

Which is why we have contracts, pertinent contract clauses, incentives, penalties and lawyers.

If someone tells me that captains of industry and great leaders never lied to anyone, or misrepresented something, or exaggerated or diminished something, some way, shape, or form. That leaders have never tricked someone, failed to be entirely candid with all of their management team or fooled an entire organisation, and a large list of etceteras, then I’ll show you someone who is being, to put it politely, naïve. Of course, it could also be that they are simply lying. More to the point, if I had a dollar for every time a management consultant told a porkie, I would be surrounded by mountains of Ben Franklins.

The people I will never trust are those who claim to be above the human condition, a superior form of being, all sacred and without profanity. These people cannot be trusted at all because they are permanently mendacious, and delusional, freakily so, and they do not actually know it or are willing to recognise it.

So remember this, honesty may be the best policy, but reality and pragmatism dictates that it’s not the only best policy, and that we live in an unfair, volatile and competitive world, where the biggest liars are those who pretend they couldn’t possibly lie.

Moreover, before I leave you, just remember this:

[Vincent “Vinnie” Antonelli is questioned about the stolen goods in the trunk of the car he stole]

Hannah Stubbs: The books…

Vinnie: You have something against books?

Hannah Stubbs: I have nothing about books! I am curious about the books in your trunk.

Vinnie: You see, I was thinking of writing my story, so I bought this one on how to do it.

Hannah Stubbs: Why do you need 25 copies of it?

Vinnie: In case I want to read it more than once…

Thank you for reading.

Addendum – Here’s something else to consider:

This is from the back cover of Dan Ariely’s latest book The Honest Truth About Dishonesty: How We Lie to Everyone–Especially Ourselves

The New York Times bestselling author of Predictably Irrational and The Upside of Irrationality returns with a thought-provoking work that challenges our preconceptions about dishonesty and urges us to take an honest look at ourselves.

Does the chance of getting caught affect how likely we are to cheat?
How do companies pave the way for dishonesty?
Does collaboration make us more or less honest?
Does religion improve our honesty?

Most of us think of ourselves as honest, but, in fact, we all cheat. From Washington to Wall Street, the classroom to the workplace, unethical behavior is everywhere. None of us is immune, whether it’s a white lie to head off trouble or padding our expense reports. In The (Honest) Truth About Dishonesty, award-winning, bestselling author Dan Ariely shows why some things are easier to lie about than others; how getting caught matters less than we think in whether we cheat; and how business practices pave the way for unethical behavior, both intentionally and unintentionally. Ariely explores how unethical behavior works in the personal, professional, and political worlds, and how it affects all of us, even as we think of ourselves as having high moral standards.

But all is not lost. Ariely also identifies what keeps us honest, pointing the way for achieving higher ethics in our everyday lives.

With compelling personal and academic findings, The (Honest) Truth About Dishonesty will change the way we see ourselves, our actions, and others.

Big Data in Question – Again

01 Sunday Mar 2015

Posted by Martyn Jones in All Data, Big Data, Consider this

≈ Leave a comment

Tags

All Data, Big Data, data management, Good Strat, good strat blog, Good Strategy, Martyn Jones, Martyn Richard Jones

Big Data is now an inhospitable and unhealthy land inhabited by those who, through accident or design, deceive naïve and sentimental bystanders and those who are willingly mislead.

When all of this Big Data malarkey started it was sort of funny, humorous and occasional witty, especially in the affected, bizarre and the frequently uninhibited ways that freshly-minted self-appointed gurus and experts would “big it up”

Doctor Freud would have had a field day with all of that, being as it was, and for that matter still is, a postmodern mishmash of Riefenstahl, Freddy Mercury and Monty Python on steroids. However, after that extended, operatic and high-camp hiatus it all went downhill.

The Big Data scene is fast becoming an outrageous and brash festival of deception, disinformation and obliviousness. Which is a pity, because it does the industry no good whatsoever.

It is telling that Big Data evangelists, gurus and assorted sycophants cannot even define Big Data adequately, never mind discuss (or for that matter, point at) tangible success stories, without falling into contradictions on all of the key defining characteristics of volume, variety and velocity, and resorting to crude debating devices to avoid or finesse the concerns and the questions.

Almost every morning I check out the industry news, and almost invariably, it comes with new mind-boggling examples of Big Data nonsense.

However, it isn’t always nonsense for nonsense’s sake, there are agendas, there are rational explanations why Big Data has become at the same time, one of the most hyped up fads in the history of IT, and one that its supporters find so difficult to actually explain and justify, in any reasonable sort of way.

Therefore, when it comes to Big Data, beyond the surfeit of platitudes, clichés, bluff and bluster, the only thing in play are the interests of industry, the patrons, the courtesans and their entourage of the innocent and the beguiled.

One of the biggest deceptions in Big Data is in the misleadingly named ‘success stories’. The thing is that most of these success stories that I have ever read have been:

  • So vague that it’s difficult to know how success is being defined never mind reached.
  • So secretive and obtuse is the avoidance of naming names, locations and other relevant Big Data references that it’s impossible to corroborate if these claims are actually true or not. Disclaimer: I have worked for some of the biggest IT vendors, and in senior roles, and I know what is behind comments such as “the Big Data project is a success, although the client name and project are confidential” and “it’s delivering such major competitive advantages that we are obliged to keep it under wraps”.
  • Stories stolen from elsewhere, such as from Data Warehousing, Business Intelligence, VLDB or Business Application projects.
  • Borderline fantasies and badly contrived technology fan fiction.

However, it doesn’t stop there.

One of the clearest examples of the questionable nature of Big Data evangelism is when it is used to piggyback Big Data hype on simple, tangible and immediately recognisable artefacts or applications that have little in common with Big Data.

This is an extreme illustration, but it works like this: “iPhones are commercially successful, iPhones are part of Big Data, and therefore Big Data is commercially successful.”

As if the mere conjuring up of association, affinity and proximity will convince people of the great and growing value of Big Data.

What I am also referring to are publicity pieces that may as well have been titled:

  • Smith, Galbraith, Mies, Keynes, Homer SImpson and the economic justification of Big Data
  • Lovelace, Babbage, von Neumann, Eckert, Davies, Codd, Knuth, Naur and the technological underpinnings of Big Data
  • Einstein, Freud, Edison, Faraday, Recorde and the intellectual structure of Big Data
  • Socrates, Kant, Hegel, Marx , Adorno and the philosophical correctness of Big Data
  • Great quotes about Big Data, from the Cambrian era to the postmodern époque
  • Great jokes about Big Data, from Mel Brooks to Steve Martin
  • Sportspeople and Big Data, from Lottie Dodd and Babe Ruth to Rafa Nadal and CR7
  • Industry support of Big Data, from Henry Ford to Neutron Jack

Do you recognise similarities?

It’s no big deal, just the use of unreliable, misleading and inappropriate fallacies, dressed up as cute, plausible and accessible collateral. People may think that such things are clever and witty, but they aren’t, it’s just misleading.

Let’s continue with something simple.

Evasion is, in ethics, an act that deceives by stating a true statement that is immaterial or leads to a false deduction. For example, citing events, persons or anecdotes from the history of IT to justify the supposed or imaginary value of Big Data. This is close to the notion of a non sequitur, which of course is an argument, the conclusions from which do not follow from its premise. It falls short of being full-on sophistry, purely because the simplistic, puerile and superficial arguments put forward in favour of Big Data do not match those of the true sophist who seeks to reason with clever but fallacious and deceptive arguments. Too many of the Big Data arguments are fallacious and deceptive, but no one, equipped with a reasonable capacity for critical thinking, should take such ‘arguments’ as valid.

Hold this thought: Big Data hype is a viper’s nest of logical fallacies, white lies and disinformation.

Just when I think things could not get any weirder, they do, and Big Data ceiling of hyperbole rises even higher, up to the rarer atmosphere of extreme tendentiousness.

There is a growing mass of Big Data hoop-la, hyperbole and flim flam that exceeds all previously bounds of overstatement, solecism and confabulation. This is where the real volumes, varieties and velocities are in Big Data; in hokie.

We live, as Oscar Wilde said in his day, in and age of surfaces. Yes, superficiality, puerility and short-termism are the competing orders of the day. However, I am still amazed – and maybe wrongly so – by what ostensibly professional, experienced and knowledgeable people are willing, able and prepared to accept, especially when it comes to Big Data flim flam sauce.

Here are some examples of the nonsense about Big Data that is taken as gospel by ‘adults’:

Data Warehousing is part of Big Data: No comment.

Big Data will replace Enterprise Data Warehousing: People can’t even explain the features and benefits of Big Data. I try it make it as easy as possible, ‘if you can’t say it, point to it’. But, seriously, people can’t even relate tangible and credible Big Data success stories, never mind show how it will replace Enterprise Data Warehousing, whether that’s the Inmon or Kimball flavour, take your pick.

Everyone and every organisation can benefit from Big Data: If people can’t explain this, and they don’t in terms of tangible benefits, then the claim should remain questionable.

Data Scientists will replace Statisticians: Why is that so? It is claimed that Data Scientists are uniquely equipped to handle massive volumes, varieties and velocities of data – well, as it turns out, this isn’t certain either.

Big Data is in its infancy: I think we may be confusing infancy with lack of real traction, and of time and place utility.

You cannot be serious: Just what are people talking about here? I have read vague, naïve and ill-informed pieces about data management, data architecture, data warehousing, reporting, business intelligence and a plethora of etcetera that have been passed off as observations and commentary on Big Data. So, what makes people recycle hackneyed, misleading and badly conceptualised ‘content’?

In the commentary on one of Bernard Marr’s pieces on LinkedIn (a professional networking site) I observed that no one can adequately explain what Big Data is without falling into contradictions and fancies, and no one seems to be capable or willing to provide tangible success stories.

Bernard responded to this comment by pointing out “the reason for that is that Big Data means different things to different people.”

Fair enough. It’s an explanation.

That said, I have always had more than a tenuous dislike of postmodern thinking, in fact most things ‘postmodern’. Call me old fashioned, jaded or cynical, but to me, the idea that everything can mean anything is an aberration that I prefer to leave to others.

I am at a loss to explain why so many reasonable people are willing to embrace the hype surrounding Big Data and Big Data Analytics, including the attendant surfeit of nonsense, incongruences and contradictions, and from my perspective, it defies reason and good sense.

Therefore, I will just end again with a fabulous quote from Ben Goldacre:

“You cannot reason people out of a position that they did not reason themselves into”.

Many thanks for reading.

Contradictions of Big Data – Short

01 Sunday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

≈ Leave a comment

Tags

Big Data, data management, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

Please note: This is an edited version of a previous piece with a similar name, but focusing solely on the three main Vs of Big Data.

What we’ve been told

We’ve been told that business Big Data is the greatest thing since sliced bread, and that its major characteristics are:

  • massive volumes – so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and
  • high variety – not only structured data, but also the whole range of digital data, and
  • high velocity – the speed at which data is generated, transmitted and received

Which is a simple and straightforward means of classification. Big Data is about massive volumes, high variety and high velocity. Right?

It’s not about big

I have never bought into the idea that more data is necessarily better data, or that it provides better focus or leads to increased insight, in fact I have been quite vocal with my contrarian opinion, but now this view is getting some additional support, and from some surprising corners.

In a recent blog piece on IBM’s Big Data and Analytics Hub (Big data: Think Smarter, not bigger), Bernard Marr wrote that “the truth is, it isn’t how big your data is, it’s what you do with it that matters!”

Over at Fierce Big Data it was Pam Baker who stated that “the term big data is unfortunate because it’s really not about the size of the data”. (Big data is not about petabytes, but complex computing).

Elsewhere, SAS echoed similar sentiments on their web site: “The real issue is not that you are acquiring large amounts of data. It’s what you do with the data that counts.”

Well, apparently Big Data isn’t about “massive volumes” of data.

Strike 1!

It’s not about variety

It is claimed that 20% of digital data is structured, it is based on the problematic suggestion that structured data is uniquely relational.

It is also said that unstructured data includes CSV files and XML data, and this makes up far more than the 20% of the data generated. But this definition is wrong.

If anything, CSV data is structured, and XML data is highly structured, and it’s typically regular ASCII data. So there it does not add variety, even though it is not structured in the ways that some someone might expect, especially if that someone lacks the required knowledge and experience. Simply stated, CSV data is structured, it’s just that it lacks rich metadata, but that doesn’t make it unstructured.

“But”, I hear you say “what about all the non-textual data such as multi-media, and what about the masses of unstructured textual data?”

Take it from me, most businesses will not be basing their business strategies on the analysis of a glut of selfies, juvenile twittering, home videos of cute kittens, or the complete works of William Shakespeare. Almost all business analysis (whether done by a professional statistician or a data scientist) will continue to be carried out using structured data obtained primarily from internal operational systems and external structured data providers.

Variety, Sir? No problem.

Strike two!

It’s not even about velocity

So, if we accept that Big Data isn’t really about the massive data volumes or high data variety then that leaves us with velocity. Because if it isn’t about record breaking VLDB or significant data variety, then for most commercial businesses the management of data velocity becomes either less of an issue or just is no issue.

Even in some extreme circumstances, one can explore the suggestion that data sampling can remove issues with data volume as well as velocity.

However, the fact that some software vendors and IT service suppliers set up this‘straw man’ velocity argument and then knock it down with the ‘amazing powers’ of their products and services, is quite another matter.

So, is it really about velocity?

Strike three!

So what is it really about?

Big Data is a dopey term, applied necessarily ambiguously to a surfeit of tenuously connected vagaries, and its time has come and gone. Let’s dump the Big Data moniker, and the 3 Vs along with it, and embrace the fact that data is data, there will always be more of it.

So, let’s consider ‘all data’ and principally for its time and place utility.

If there is something that you are not sure about or have questions with then please leave a comment below or email me.

Thanks very much for reading.

Newer posts →
Follow GOOD STRATEGY on WordPress.com

Top posts

  • Absolute Beginners
  • The World's Best Data Quotes... Including Big Data quotes
  • The Amazing Big Data Challenge – 2015
  • BREXIT IS NOT LEGIT
  • You don’t need a data warehouse to do data warehousing
  • Consider this: The ten key dimensions of Applied Business Knowledge and AI
  • Consider this: Zionism, Nationalism and Suffragettes

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,439 other subscribers

Names in the cloud

4th generation Data Warehousing All Data Ask Martyn Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy IT strategy Martyn does Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

The Good Strat Archives

  • March 2023
  • January 2022
  • December 2021
  • November 2021
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • July 2019
  • June 2019
  • May 2019
  • December 2018
  • January 2018
  • December 2017
  • October 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014

The Stats

  • 100,281 hits

Recent posts

  • You don’t need a data warehouse to do data warehousing March 22, 2023
  • Data Warehousing means having thousands of ETL jobs March 21, 2023
  • The data warehouse is the repository for the post-transactional data March 20, 2023
  • Does your way of providing data have business value? March 19, 2023
  • Data warehousing stands in the way of progress March 18, 2023
  • Data Trailblazers: 2022 Vision January 2, 2022
  • Tea with The Data Contrarian: Afilonius Rex December 10, 2021
  • Reality Check: Data Mesh and Data Warehousing   December 5, 2021
  • Myth-busting: Data Mesh and Data Warehousing – Revisited November 25, 2021
  • Heaven help us! Have you seen the latest Virtual Data Warehouse bullshit? June 26, 2020

Hours & Info

Martyn Richard Jones
Madrid, Spain
+33 767 120 160
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com

Follow me on Twitter

My Tweets

Top Good Strat Posts & Pages

  • Absolute Beginners
  • The World's Best Data Quotes... Including Big Data quotes
  • The Amazing Big Data Challenge – 2015
  • BREXIT IS NOT LEGIT
  • The Good Strategy Company
  • You don’t need a data warehouse to do data warehousing
  • Consider this: The ten key dimensions of Applied Business Knowledge and AI
  • Consider this: Zionism, Nationalism and Suffragettes

Good strat tag cloud

accountability advertising All Data Analytics aspiring tendencies in IM awareness Banking Behavioural Economics BI Big Data Bill Inmon Brexit BS Business business analysis Business Enablement business intelligence Business Management business strategy Challenges Commercial IT Consider this corporate assets Corporate IT Creativity data data analytics data architecture data integration data management Data Marts data science Data Warehouse Demagogism Dogma DW 3.0 Economics enterprise data warehousing EU Financial Goal Setting goodstart good start Good Strat goodstrat Good Strategy hadoop Information and Technology information management Information Technology IT business IT Strategy knowledge management leadership marketforces Marketing Martyn Jones Martyn Richard Jones MDM Offshoring operationalwareness Organisational Autism organisational awareness Outsourcing Pimps Politics project management Requirements management Risk Risk Management statistics Strategy trading traditional assets UK

Categories

  • 4th generation Data Warehousing
  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

  • Follow Following
    • GOOD STRATEGY
    • Join 131 other followers
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
    To find out more, including how to control cookies, see here: Cookie Policy