DATA WORLD: Friends! Romans! Countrymen! Big Data is not Data Warehousing – 2026/01/29

11 Wednesday Feb 2026

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, Good Strat, hadoop, hdfs, Martyn Jones

Tags

Analytics, Big Data, books, Data Mart, Data Warehouse, Data Warehousing, enterprise data warehousing, Good Strat, Good Strategy, google, Martyn Jones, Martyn Richard Jones, News, sports, technology

Martyn Rhisiart Jones

29th January 2026

Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.

I am a mild mannered person. However, one thing that irks me is hearing variations on certain themes. These themes include phrases like “Data Warehousing is Big Data.” Another is “Big data is in many ways an evolution of data warehousing.” Lastly, some say “with Big Data you no longer need a Data Warehouse.”

Big Data is not Data Warehousing. It is not the evolution of Data Warehousing. It is also not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.

Continue reading →

Myth-busting: Data Mesh and Data Warehousing – Revisited

25 Thursday Nov 2021

Posted by Martyn Jones in agile, anthropology, Architecture, Best principles, Data Mart, data mesh, Data Supply Framework, Data Warehouse, Data Warehousing, hadoop, Inform, educate and entertain.

≈ 1 Comment

To begin at the beginning

I am quite a fan of many aspects that sit under the Data Mesh umbrella. However, when it comes to a proper fact-based understanding and analysis of the history, place and architecture (business, data and technical) of Data Warehousing, the leading exponents of data mesh have it woefully wrong.

Therefore, the purpose of this blog article is to set the record straight.

The data warehouse as a place to copy OLTP exhaust data to?

Continue reading →

Saving Private Hadoop

07 Friday Jun 2019

Posted by Martyn Jones in Big Data, Big Data 7s, Big Data Analytics, hadoop, The Amazing Big Data Challenge, The Big Data Contrarians

≈ Leave a comment

Brussels 7th June 2019

Martyn Richard Jones

By all accounts Hadoop is heading down the pan and bits of big data are not that far behind.

But should we rejoice?

Continue reading →

What if the Hadoop Ecosphere Were a Box of Chocolates

12 Tuesday Dec 2017

Posted by Martyn Jones in Big Data, Big Data 7s, Big Data Analytics, hadoop, Inform, educate and entertain., The Amazing Big Data Challenge, The Big Data Contrarians

≈ 1 Comment

Martyn Richard Jones

Gif sur Yvette, 12^th December 2017

Just because it looks like a bunch of half-baked crap, based on some well shoddy ideas and made by a surfeit of half-arsed chancers, doesn’t mean that it is, does it…

Even though it is.

Sorry! But, I digress.

If the Hadoop ecosphere were a box of chocolates, it would come in fantastic packaging that would make the grandest of Belgian chocolatiers emerald green with envy.

Continue reading →

How Hadoop Revolutionised IT

05 Saturday Mar 2016

Posted by Martyn Jones in All Data, Ask Martyn, Big Data, Big Data 7s, Big Data Analytics, dark data, data architecture, Data governance, Data Lake, data management, data science, Data Supply Framework, Data Warehouse, Data Warehousing, hadoop, Inform, educate and entertain., Marty does, Martyn does, Martyn Jones, Martyn Richard Jones, pig data, The Amazing Big Data Challenge, The Big Data Contrarians

≈ 1 Comment

This is the story of how the amazing Hadoop ecosphere revolutionised IT. If you enjoy it, then consider joining The Big Data Contrarians.

Before the advent of Hadoop and its ecosphere, IT was a desperate wasteland of failed opportunities, archaic technology and broken promises.

In the dark Cambrian days of bits, mercury delay lines and ferrite cores, we knew nothing about digital. The age of big iron did little to change matters, and vendors made enormous profits selling systems that nobody could use and even fewer people could understand. Continue reading →

The Princess Diana Memorial Data Lake

11 Wednesday Nov 2015

Posted by Martyn Jones in Big Data, Data Lake, hadoop, Inform, educate and entertain., Martyn Jones

≈ Leave a comment

Tags

Big Data, data lake, Martyn Jones

If you enjoy this piece or find it useful then please consider joining The Big Data Contrarians: https://www.linkedin.com/grp/home?gid=8338976

Many thanks, Martyn.

If Princess Diana had been alive during the formative years of the Big Data revolution there would have been a plethora of influential Big Data bullshit babblers issuing their gushingly awful pieces in places like Forbes, the WSJ and professional blogging forums about the Big Data humanitarian causes closest to the heart of the peoples’ princess. And if tragedy had repeated itself, and had been reported not as paparazzi driven schmaltz or morbid vulgarity, but as something even more rancid and farcical, we would now have a Princess Diana Memorial Data Lake in some regal park in London or Milton Keynes –powered by Hadoop. Because, as the bullshit babblers would have it, “that is what she would have wanted”.

But, is this entirely fair? Should we view the outpourings of the biggest Big Data bullshit babblers on the entire internet as the inevitable result of free will, or is Big Data a message from God, in the same way that hard drugs are a signal to certain rock stars that they have too much available cash?

Which brings me to another issue. In a recent interview, I was given a list of data related terms, and was asked which one I preferred. Big Data, Smart Data, Small Data… you know what I mean. Anyway, I went off on a tangent about domestic pets and anthropomorphism. Okay, so it was logical entrapment, but I wanted to make a point. “Don´t you think that ascribing human behavior and thought to pets is a bit weird?” I asked. “No, came back the reply”. It wasn´t the answer I wanted, because the answer I wanted was “Yes, it certainly is” not a “No, that’s what my mum thinks as well”. I wanted to say see, people who ascribe human characteristics to dogs strike us as being a bit fanciful, but people who do the same for data? How can a bunch of recognizable symbols embody smartness? I mean, data by itself, of itself, is dumber than a rock.

So why do we pretend that the information, knowledge and the smarts are in the data and that data itself, without the need for any intervention (other than Hadoop, Sparke or Hive, etc.), is capable of revealing this smartness?

And the only thing I can think is that we are so desperate to sell useless crap that no one needs or wants, that we are even capable of saying the most dopiest of things in order to do so.

Anyway, I was at a Big Data conference recently, and every presenter selling a tool made exactly the same type of pitch. The amazing ways that their tools could establish correlations. Some of the examples of the correlations were so contrived, so obviously the creation of PR than the outcome of hands-off automated analysis, that it became seriously embarrassing, not as a professional, but as a human being. What´s more, no one mentioned the absent elephantine concept of causation, so everyone who went in clueless stayed happy in their ignorance throughout the whole wham-bam-tank-you-mam dog and pony session.

Now, I do think that the sort of data processing associated with Big Data does have a place in the old IT toolkit, but the levels of hype, misappropriation and downright lies is seriously queering the pitch. Just look at some of the Big Data articles in places like Forbes, Information Management and LinkedIn. If you haven’t yet noticed the tendency to use tremendous volumes, varieties and velocities of bullshit to push the Big Data envelope, then you really haven’t been paying enough attention.

Many thanks for reading.

If you enjoy this piece or find it useful then please consider joining The Big Data Contrarians: https://www.linkedin.com/grp/home?gid=8338976

Many thanks, Martyn.

Amazing Data Warehousing with Hadoop and Big Data

26 Sunday Jul 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, good start, goodstart, hadoop

≈ Leave a comment

Tags

Big Data, cloudera, enterprise data warehousing, goodstart, hadoop

Many thanks for reading, and don’t forget, please join The Big Data Contrarians.

Some time back, Bill Inmon, the father of Data Warehousing, took the Hadoop vendor Cloudera to task for putting out some confusing advertising.

In recent times, Cloudera have linked up with Ralph Kimball, who, as some in the data world will know, has been an eternal ‘rival’ of Bill Inmon.

For some, the name of Ralph Kimball has become synonymous with dimensional modelling, and although the Kimball Group once stated that Ralph did not invent the original basic concepts of facts and dimensions, Ralph has contributed much to the development of dimensional modelling and the innovative use of SQL. Subsequently, the Kimball Group reassessed, and are now labelling Ralph as the “Dimensional modelling inventor”.

Kimball and Cloudera have collaborated on a number of initiatives, such as a webinar and slide set, with particular emphasis on the theme of Hadoop and Data Warehousing.

Now, I do not know whether this is intentional or accidental, but this collaboration has produced a lot of disingenuous claims and dubious comparisons, so much so, that I get the impression that building the DW Disinformation Factory is becoming a cottage industry in its own right.

Personally, I can see scenarios in which Big Data complements Enterprise Data warehousing, and I have explained my vision and possible architectures for these scenarios. However, what some Hadoop vendors are alluding to in the Data Warehousing space, is actually quite mischievous and misleading and is not constructive in the least, in fact, the biggest side-effect is to muddy the Big Data and Data Warehousing waters even further. That is not good, either for the industry or for the customers, or indeed, for the professionals.

In one piece of content from Cloudera, we can read that…

“Dr. Kimball explains how Hadoop can be both:

A destination data warehouse, and also

An efficient staging and ETL source for an existing data warehouse”

On the first point? No, Hadoop will not be replacing Teradata, Oracle, EXASol or any other high-performance relational database management system.

On the second point. Hadoop could support a data source for Data Warehousing, as can many other technologies. However, there is no such animal as an ETL source. There are data sources and data targets, extractions, transformations and loads, and all that cool data management, but ETL is a technology, not a source.

I think Big Data may have a big future; it depends on how deeply the internet development culture pervades enterprise application development. A lot of what Big Data addresses is about is making up for shortfalls created by badly architected web applications and shoddy application development, in which data use and data persistence were at best workaround bodges, rather than being well designed and coherent approaches to data management.

Maybe this is some why people have a hard time explaining why they are considering using Hadoop technologies for Big Data. What would a CEO say if it was brought to their attention that Hadoop was being used in their business simply to make up for the fact that their internet applications are really shoddy examples of analysis, design, architecture and management? More to the point, what would the shareholders say if they understood the full ramifications behind the need to use Hadoop?

In many cases, I think that Hadoop can be an indication that your IT organisation did something very wrong in the past, and that in these cases Hadoop is the price one pays when you one does not want to bite the bullet and admit that to screwing up, big time.

In my opinion, it would make more sense to replace applications built on faulty architectures with robust and well-architected applications, rather than fix a problem by overmedicating the patient. This would mean that data generated and used by these applications could simply dovetail into standard decision-support data platforms, such as the Enterprise Data Warehouse.

As for Cloudera and their bizarre and babbling baloney about Hadoop replacing the Data Warehouse? I suggest they read a book in the subject of Building the Data Warehouse, and maybe buck up their ideas a bit. As Bill Inmon stated “You would think that the executives of Cloudera would have familiarized themselves with what a data warehouse is.”

As for recognised data professionals and influencers who support such Hadoop tripe? The less said the better. Eh, Ralphie?

That stated, maybe Cloudera, Kimball and the Big Data flim-flam merchants simply don’t care.

So go ahead, “turbocharge your Porsche – buy an elephant.”

Many thanks for reading. Don’t forget, please join The Big Data Contrarians. The best Big Data community on the planet.

Consider this: Big Data is not Data Warehousing

06 Friday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, Good Strat, hadoop, hdfs, Martyn Jones

≈ 4 Comments

Tags

Big Data, enterprise data warehousing, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.

Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.

In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.

Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.

Data Warehousing is a strategy driven, business oriented and technology based business process.

We stock Data Warehouses with data that, in one way or another, comes from internal and optional external sources, and from structured and optional unstructured data. The process of getting data from a data source to the target Data Warehouse, involves extraction, scrubbing, transformation and loading, ETL for short.

Data Warehousing’s defining characteristics are:

Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.

Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.

Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.

Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible – and certainly in most cases not cost effective – to determine with an operational database.

Non-Volatile: Non-volatility means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The Data Warehouse is, of course, first loaded with cleaned, integrated and transformed data that originated in the operational databases.

We build Data Warehouses iteratively, a piece or two at a time, and each iteration is primarily a result of business requirements, and not technological considerations.

Each iteration of a Data Warehouse is well bound and understood – small enough to be deliverable in a short iteration, and large enough to be significant.

Conversely, Big Data is characterised as being about:

Massive volumes: so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and

High variety: not only structured data, but also the whole range of digital data, and

High velocity: the speed at which data is generated, transmitted and received.

These are known as the three Vs of Big Data, and they are subject to significant and debilitating contradictions, even amongst the gurus of Big Data (as I have commented elsewhere: Contradictions of Big Data).

From time to time, Big Data pundits slam Data Warehousing for not being able to cope with the Big Data type hacking that they are apparently used to carrying out, but this is a mistake of those who fail to recognise a false Data Warehouse when they see one.

So let’s call these false flag Data Warehouse projects something else, such as Data Doghouses.

“Data Doghouse, meet Pig Data.”

Failed or failing Data Doghouses fail for the same reasons that Big Data projects will frequently fail. Both will almost invariably fail to deliver artefacts on time and to expectations; there will be failures to deliver value or even simply to return a break even in costs versus benefits; and of course, there will be failures to deliver any recognisable insight.

Failure happens in Data Doghousing (and quite possibly in Big Data as well) because there is a lack of coherent and cohesive arguments for embarking on such endeavours in the first place; a lack of real business drivers; and, a lack of sense and sensibility.

There is also a willing tendency to ignore the advice of people who warn against joining in the Big Data hubris. Why do some many ignore the ulterior motives of interested parties who are solely engaged in riding on the faddish Big Data bandwagon to maximise the revenue they can milk off punters? Why do we entertain pundits and charlatans who ‘big up’ Big Data whilst simultaneously cultivating an ignorance of data architecture, data management and business realities?

Some people say that the main difference between Big Data and Data Warehousing is that Big Data is technology, and Data Warehousing is architecture.

Now, whilst I totally respect the views of the father of Data Warehousing himself, I also think that he was being far too kind to the Big Data technology camp. However, of course, that is Bill’s choice.

Let me put it this way, if Oracle gave me the code for Oracle 3, I could add 256 bit support, parallel processing and give it an interface makeover, and it would be 1000 times better than any Big Data technology currently in the market (and that version of Oracle is from about 1983).

Therefore, Data Warehousing has no serious competing paragon. Data Warehousing is a real architecture, it has real process methodologies, it is tried and proven, it has success stories that are no secrets, and these stories include details of data, applications and the names of the companies and people involved, and we can point at tangible benefits realised. It’s clear, it’s simple and it’s transparent.

Just like Big Data, right?

Well, no.

See what I mean?

Therefore, the next time someone says to you that Big Data will replace Data Warehousing or that Data Warehousing is Big Data, or any variations on that sort of ‘stupidity’ theme, you can now tell them to take a hike, in the confidence that you are on the side of reason.

Many thanks for reading.

More perspectives on Big Data

Aligning Big Data: http://www.linkedin.com/pulse/aligning-big-data-martyn-jones

Big Data and the Analytics Data Store: http://www.linkedin.com/pulse/big-data-analytics-store-martyn-jones

A Modern Manager’s Guide to Big Data:http://www.linkedin.com/pulse/managers-guide-big-data-context-martyn-jones

Core Statistics coexisting with Data Warehousing

Accomodating Big Data

And a big thank you to Bill Inmon (the father of Data Warehousing and of DW 2.0)

	miniPcs on Datos de Forma Deliberada: Mej…
	#writing on HOW NOT TO: Embrace the AI Swa…
	#tech on HOW NOT TO: Embrace the AI Swa…
	Literbook on GRIFTER’S CORNER: Is AI…
	#writing on GRIFTER’S CORNER: Is AI…

GOOD STRATEGY REBELLION

Category Archives: hadoop

DATA WORLD: Friends! Romans! Countrymen! Big Data is not Data Warehousing – 2026/01/29

Myth-busting: Data Mesh and Data Warehousing – Revisited

To begin at the beginning

The data warehouse as a place to copy OLTP exhaust data to?

Saving Private Hadoop

What if the Hadoop Ecosphere Were a Box of Chocolates

How Hadoop Revolutionised IT

The Princess Diana Memorial Data Lake

Amazing Data Warehousing with Hadoop and Big Data

Consider this: Big Data is not Data Warehousing

More perspectives on Big Data

Core Statistics coexisting with Data Warehousing

And a big thank you to Bill Inmon (the father of Data Warehousing and of DW 2.0)