• Home
  • About
  • The Good Strategy Blog
  • Strategy
    • Data Warehousing
    • Ask Martyn

GOOD STRATEGY

~ for every significant challenge

GOOD STRATEGY

Tag Archives: enterprise data warehousing

Big Data: 10 truths, 10 myths and 10 unwise things

17 Tuesday Nov 2015

Posted by Martyn Jones in 4th generation Data Warehousing, Ask Martyn, Big Data, Data Warehousing, Information Management, Uncategorized

≈ Leave a comment

Tags

Big Data, enterprise data warehousing, Martyn Jones, Strategy

StartSMILESMany people come up to me in the street and beg me to write about the truths, myths and unwise things said about Big Data. I am offered gifts of goats, partners and riches beyond the dreams of avarice just to pronounce on such things. I am not in the habit of bowing to such street-pressure, but I have finally come round to doing something, if only to placate the river of rose-petal bearing infants’ tears flowing past my abode.

Big Data is for everyone

Continue reading →

Data Warehousing will save Big Data

11 Wednesday Nov 2015

Posted by Martyn Jones in 4th generation Data Warehousing, All Data, Big Data, Data Lake

≈ Leave a comment

Tags

Big Data, enterprise data warehousing

Considering the canvas that is the Pacific Ocean. “How on earth” he thought, “can people die of thirst and polluted water, when we have so much fresh, clean and pristine water on this goddam planet?”

The Data Leviathan, Martyn Jones Continue reading →

Amazing Data Warehousing with Hadoop and Big Data

26 Sunday Jul 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, good start, goodstart, hadoop

≈ Leave a comment

Tags

Big Data, cloudera, enterprise data warehousing, goodstart, hadoop

Many thanks for reading, and don’t forget, please join The Big Data Contrarians.

Some time back, Bill Inmon, the father of Data Warehousing, took the Hadoop vendor Cloudera to task for putting out some confusing advertising.

In recent times, Cloudera have linked up with Ralph Kimball, who, as some in the data world will know, has been an eternal ‘rival’ of Bill Inmon.

For some, the name of Ralph Kimball has become synonymous with dimensional modelling, and although the Kimball Group once stated that Ralph did not invent the original basic concepts of facts and dimensions, Ralph has contributed much to the development of dimensional modelling and the innovative use of SQL. Subsequently, the Kimball Group reassessed, and are now labelling Ralph as the “Dimensional modelling inventor”.

Kimball and Cloudera have collaborated on a number of initiatives, such as a webinar and slide set, with particular emphasis on the theme of Hadoop and Data Warehousing.

Now, I do not know whether this is intentional or accidental, but this collaboration has produced a lot of disingenuous claims and dubious comparisons, so much so, that I get the impression that building the DW Disinformation Factory is becoming a cottage industry in its own right.

Personally, I can see scenarios in which Big Data complements Enterprise Data warehousing, and I have explained my vision and possible architectures for these scenarios. However, what some Hadoop vendors are alluding to in the Data Warehousing space, is actually quite mischievous and misleading and is not constructive in the least, in fact, the biggest side-effect is to muddy the Big Data and Data Warehousing waters even further. That is not good, either for the industry or for the customers, or indeed, for the professionals.

In one piece of content from Cloudera, we can read that…

“Dr. Kimball explains how Hadoop can be both:

A destination data warehouse, and also

An efficient staging and ETL source for an existing data warehouse”

On the first point? No, Hadoop will not be replacing Teradata, Oracle, EXASol or any other high-performance relational database management system.

On the second point. Hadoop could support a data source for Data Warehousing, as can many other technologies. However, there is no such animal as an ETL source. There are data sources and data targets, extractions, transformations and loads, and all that cool data management, but ETL is a technology, not a source.

I think Big Data may have a big future; it depends on how deeply the internet development culture pervades enterprise application development. A lot of what Big Data addresses is about is making up for shortfalls created by badly architected web applications and shoddy application development, in which data use and data persistence were at best workaround bodges, rather than being well designed and coherent approaches to data management.

Maybe this is some why people have a hard time explaining why they are considering using Hadoop technologies for Big Data. What would a CEO say if it was brought to their attention that Hadoop was being used in their business simply to make up for the fact that their internet applications are really shoddy examples of analysis, design, architecture and management? More to the point, what would the shareholders say if they understood the full ramifications behind the need to use Hadoop?

In many cases, I think that Hadoop can be an indication that your IT organisation did something very wrong in the past, and that in these cases Hadoop is the price one pays when you one does not want to bite the bullet and admit that to screwing up, big time.

In my opinion, it would make more sense to replace applications built on faulty architectures with robust and well-architected applications, rather than fix a problem by overmedicating the patient. This would mean that data generated and used by these applications could simply dovetail into standard decision-support data platforms, such as the Enterprise Data Warehouse.

As for Cloudera and their bizarre and babbling baloney about Hadoop replacing the Data Warehouse? I suggest they read a book in the subject of Building the Data Warehouse, and maybe buck up their ideas a bit. As Bill Inmon stated “You would think that the executives of Cloudera would have familiarized themselves with what a data warehouse is.”

As for recognised data professionals and influencers who support such Hadoop tripe? The less said the better. Eh, Ralphie?

That stated, maybe Cloudera, Kimball and the Big Data flim-flam merchants simply don’t care.

So go ahead, “turbocharge your Porsche – buy an elephant.”

Many thanks for reading. Don’t forget, please join The Big Data Contrarians. The best Big Data community on the planet.

Data Warehousing Explained to Big Data Friends

20 Monday Jul 2015

Posted by Martyn Jones in Big Data, Big Data Analytics, Consider this, Data Warehousing, good start, Good Strat, goodstart, goodstrat

≈ Leave a comment

Tags

Big Data, enterprise data warehousing, good start, Good Strat, goodstart, goodstrat

Okay, before we get started I have to declare the real intent for posting this piece. It is to get you to join The Big Data Contrarians professional group here on LinkedIn.

To apply to join the best Big Data community on the web simply navigate to this address http://www.linkedin.com/grp/home?gid=8338976 (or paste it into your browser) and request membership, the process is quick and painless and well worth the effort.

Now for the rest of the news…

There are many common misconceptions amongst the Big Data collective about Data Warehousing. There are common fallacies that need clearing up in order avoid unnecessary confusion, avoidable risks and the damaging perpetuation of disinformation.

Big Picture

In the dim and distant past of business IT, the best information that senior executives could expect from their computer systems were operational reports typically indicating what went right or wrong or somewhere in between.  Applied statistical brilliance made up for what data processing lacked in processing power, up to a point, because even heavy lifting statistics requires computing horsepower, which in those days was really a question of serious capital expenditure, which not all companies were willing to commit to.

Then, and curiously coincidentally, people around the world started to posit the need for using data and information to address significant business challenges, to act as input into the processes of strategy formulation, choice and execution. Reports would no longer just be for the Financial Directors or the paper collectors, but would support serious business decision making.

Many initiatives sprang up to meet the top-level decision-making data requirements; they were invariably expensive attempts, with variable outcomes. Some approaches were quite successful, but far too many failed, until the advent of Data Warehousing.

Back then, most of the data that could potentially aid decision-making was in operational systems. Both an advantage and a problem. Data in operational systems was like having data in gaol. Getting data into operational systems was relatively easy, getting it out and moving it around was a nightmare. However, one of the advantages of operational data is that it was generally stored in a structured format, even if data quality was frequently of a dubious nature, and ideas such as subject orientation and integration were far from being widespread.

Of course, data also came in from external sources, but usually via operational databases as well. An example of such data is instrument pricing in financial services.

Therefore, briefly, a lot of Data Warehousing started as a means to provide data to support strategic decision-making. Data Warehousing ways not about counting cakes, widgets or people, which was the purview of operational reporting, or to measure sentiment, likes or mouse behaviour, but to assist senior executives, address the significant business challenges of the day.

Who’s your Daddy?

Bill Inmon, the father of Data Warehousing, defines it as being “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”

Subject Oriented: The data in the Data Warehouse is organised conceptually (the big canvas), logically (detailing the big picture and) and physically (detailing how it is implemented) by subjects of interest to the business, such as customer and product.

The thing to remember about subject areas is that they are not created ad-hoc by IT according to the sentiments of the time, e.g. during requirements gathering, but through a deeper understanding of the business, its processes and its pertinent business subject areas.

Integrated: All data entering the data warehouse is subject to normalisation and integration rules and constraints to ensure that the data stored is consistently and contextually unambiguous.

Time Variant:  Time variance gives us the ability to view and contrast data from multiple viewpoints over time. It is an essential element in the organisation of data within the data warehouse and dependent data marts.

Non-Volatile:  The data warehouse represents structured and consistent snapshots of business data over time. Once a data snapshot is established, it is rarely if ever modified.

Management Decision Making: This is the principal focus of Data Warehousing, although Data Warehouses have secondary uses, such as complementing operational reporting and analysis.

In plain language, if what your business has or is planning to have does not fully satisfy the Inmon criteria then it probably is not a Data Warehouse, but another form of data-store.

The thing to remember about informed management decision making is that it needs to be as good as required but it does not need to achieve technical perfection. This observation underlies the fact that Data Warehouse is a business process, and not an obsessive search for zero defects or the application of so called ‘leading edge’ technologies – faddish, appropriate or not.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Some Basic Terms

Before we delve into the meaning of Data Warehousing, there are a couple of terms that need to be understood first, so, by way of illustration:

Let’s follow the numbers in the simplification of the process.

  1. We gather specific and well-bound data requirements from a specific business area. These are requirements by talking to business people and in understanding their requirements from a business as well as a data sourcing and data logistics perspective. Here we must remember at all times not to over-promise or to set expectations too high. Be modest.
  2. These business requirements are typically captured in a dimensional data model and supporting documentation. Remember that all requirements are subject to revision at a later data, usually in a subsequent iteration of a requirements gathering to implementation cycle.
  3. We identify the best source(s) for the required data and we record basic technical, management and quality details. We ensure that we can provide data to the quality required. Note that data quality does not mean perfection but data to the required quality tolerance levels.
  4. Data Warehouse data models modified as required to accommodate any new data at the atomic level.
  5. We define, document and produce the means (ETL) for getting data from the source and into the target Data Warehouse. Here we also pay especial attention to the four characteristics of Data Warehousing. ETL is an acronym for Extract (the data from source / staging), Transform (the data, making it subject oriented, integrated, and time-variant) and Load (the data into the Data Warehouse and Data Mart).
  6. We define, document and produce the means for getting data from the Data Warehouse into the Data Mart. In short, a bit more ETL.
  7. User acceptance testing. NB Users must ideally be involved in all parts of the end-to-end process that involves business requirements, participation and validation.

This is a very simplified view, but it serves to convey the fundamental chain of events. The most important aspect being that we start (1) and end (7) with the user, and we fully involve them in the non-technical aspects of the process.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Business, Enterprise and Technology

Essentially, a Data Warehouse is a business driven, enterprise centric and technology based solution for continual quality improvement in the sourcing, integration, packaging and delivery of data for strategic, tactical and operational modelling, reporting, visualisation and decision-making.

Business Driven

A data warehouse is business centric and nothing happens unless there is a business imperative for doing so. This means that there is no second-guessing the data requirements of the business users, and every piece of data in the data warehouse should be traceable to a tangible business requirement. This tangible business requirement is usually a departmental or process specific dimensional data model produced together in requirements workshops with the business. We build the Data Warehouse over time in iterative steps, based on the criteria that the requirements should be small enough to be delivered in a short timeframe and large enough to be significant.

Typically, a Data Warehouse iteration results in a new Data Mart or the revision of an existing Data Mart.

Enterprise Centric

As we build up the collection of Data Marts, we are also building up the central logical store of data known as the Enterprise Data Warehouse that serves as a structured, coherent and cohesive central clearing area for data that supports enterprise decision making. Therefore, whilst we are addressing specific departmental and process requirements through Data Marts we are also building up an overall view of the enterprise data.

Technology Based

By technology, I mean technology in the broadest sense of techniques, methods, processes and tools, and not just a question of products, brands or badges.

Unfortunately, there is a popular misconception that Data Warehousing is primarily about competing popular and commercial available technology products. It isn’t, but they do play an important role.

Architecture

The following is an example of a very high-level Data Warehouse architecture diagram.

Methodologies

Various methodologies support the building, expansion and maintenance of a Data Warehouse. Here is one example of a professional data integration methodology, produced, maintained and used by Cambriano Energy.

And here is an information value-chain map as used by Cambriano Energy as part of its Iter8 process management. There are alternatives, many of which do a satisfactory job.

Last but not least, this was (from memory) the way that Bill Inmon’s Prism Solutions ETL company used to view the iterative EDW building process.

JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

Keeping it Shortish

At this point, I decided to cut short further explanations on aspects on Data Warehousing. However, if you have any question then please address them to me and I will do my best (or something close) to answer them.

That’s all folks

Hold this thought for another time: If you think you can replace a Data Warehouse, that is not a Data Warehouse, with another approach to ‘Data Warehousing’ that doesn’t produce a Data Warehouse, for as fast and cheap as one can do it, then you still don’t have a Data Warehouse to show for all of your efforts. That is not a great place to be.

Therefore, you see, Data Warehousing was never about a haphazard approach to providing random structured, semi-structured and unstructured data of various qualities, provenance, volumes, varieties and velocities, to whomever was of a mind to want it.

Many thanks for reading.

 If you want to connect then please send a request. I you have any questions or comments then fire them off below. Cheers 🙂

Oh… and one last thing before I go… DON’T FORGET TO JOIN THE BIG DATA CONTRARIANS: http://www.linkedin.com/grp/home?gid=8338976

 

Consider this: Big Data is not Data Warehousing

06 Friday Mar 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehousing, Good Strat, hadoop, hdfs, Martyn Jones

≈ 4 Comments

Tags

Big Data, enterprise data warehousing, Good Strat, Good Strategy, Martyn Jones, Martyn Richard Jones

Hold this thought: To paraphrase the great Bob Hoffman, just when you think that if the Big Data babblers were to generate one more ounce of bull**** the entire f****** solar system would explode, what do they do? Exceed expectations.

I am a mild mannered person, but if there is one thing that irks me, it is when I hear variations on the theme of “Data Warehousing is Big Data”, “Big data is in many ways an evolution of data warehousing” and “with Big Data you no longer need a Data Warehouse”.

Big Data is not Data Warehousing, it is not the evolution of Data Warehousing and it is not a sensible and coherent alternative to Data Warehousing. No matter what certain vendors will put in their marketing brochures or stick up their noses.

In spite of all of the high-visibility screw-ups that have carried the name of Data Warehousing, even when they were not Data Warehouse projects at all, the definition, strategy, benefits and success stories of data warehousing are known, they are in the public domain and they are tangible.

Data Warehousing is a practical, rational and coherent way of providing information needed for strategic and tactical option-formulation and decision-making.

Data Warehousing is a strategy driven, business oriented and technology based business process.

We stock Data Warehouses with data that, in one way or another, comes from internal and optional external sources, and from structured and optional unstructured data. The process of getting data from a data source to the target Data Warehouse, involves extraction, scrubbing, transformation and loading, ETL for short.

Data Warehousing’s defining characteristics are:

Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.

Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.

Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.

Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible – and certainly in most cases not cost effective – to determine with an operational database.

Non-Volatile: Non-volatility means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The Data Warehouse is, of course, first loaded with cleaned, integrated and transformed data that originated in the operational databases.

We build Data Warehouses iteratively, a piece or two at a time, and each iteration is primarily a result of business requirements, and not technological considerations.

Each iteration of a Data Warehouse is well bound and understood – small enough to be deliverable in a short iteration, and large enough to be significant.

Conversely, Big Data is characterised as being about:

Massive volumes: so great are they that mainstream relational products and technologies such as Oracle, DB2 and Teradata just can’t hack it, and

High variety: not only structured data, but also the whole range of digital data, and

High velocity: the speed at which data is generated, transmitted and received.

These are known as the three Vs of Big Data, and they are subject to significant and debilitating contradictions, even amongst the gurus of Big Data (as I have commented elsewhere: Contradictions of Big Data).

From time to time, Big Data pundits slam Data Warehousing for not being able to cope with the Big Data type hacking that they are apparently used to carrying out, but this is a mistake of those who fail to recognise a false Data Warehouse when they see one.

So let’s call these false flag Data Warehouse projects something else, such as Data Doghouses.

“Data Doghouse, meet Pig Data.”

Failed or failing Data Doghouses fail for the same reasons that Big Data projects will frequently fail. Both will almost invariably fail to deliver artefacts on time and to expectations; there will be failures to deliver value or even simply to return a break even in costs versus benefits; and of course, there will be failures to deliver any recognisable insight.

Failure happens in Data Doghousing (and quite possibly in Big Data as well) because there is a lack of coherent and cohesive arguments for embarking on such endeavours in the first place; a lack of real business drivers; and, a lack of sense and sensibility.

There is also a willing tendency to ignore the advice of people who warn against joining in the Big Data hubris. Why do some many ignore the ulterior motives of interested parties who are solely engaged in riding on the faddish Big Data bandwagon to maximise the revenue they can milk off punters? Why do we entertain pundits and charlatans who ‘big up’ Big Data whilst simultaneously cultivating an ignorance of data architecture, data management and business realities?

Some people say that the main difference between Big Data and Data Warehousing is that Big Data is technology, and Data Warehousing is architecture.

Now, whilst I totally respect the views of the father of Data Warehousing himself, I also think that he was being far too kind to the Big Data technology camp. However, of course, that is Bill’s choice.

Let me put it this way, if Oracle gave me the code for Oracle 3, I could add 256 bit support, parallel processing and give it an interface makeover, and it would be 1000 times better than any Big Data technology currently in the market (and that version of Oracle is from about 1983).

Therefore, Data Warehousing has no serious competing paragon. Data Warehousing is a real architecture, it has real process methodologies, it is tried and proven, it has success stories that are no secrets, and these stories include details of data, applications and the names of the companies and people involved, and we can point at tangible benefits realised. It’s clear, it’s simple and it’s transparent.

Just like Big Data, right?

Well, no.

See what I mean?

Therefore, the next time someone says to you that Big Data will replace Data Warehousing or that Data Warehousing is Big Data, or any variations on that sort of ‘stupidity’ theme, you can now tell them to take a hike, in the confidence that you are on the side of reason.

Many thanks for reading.

More perspectives on Big Data

Aligning Big Data: http://www.linkedin.com/pulse/aligning-big-data-martyn-jones

Big Data and the Analytics Data Store: http://www.linkedin.com/pulse/big-data-analytics-store-martyn-jones

A Modern Manager’s Guide to Big Data:http://www.linkedin.com/pulse/managers-guide-big-data-context-martyn-jones

Core Statistics coexisting with Data Warehousing

Accomodating Big Data

And a big thank you to Bill Inmon (the father of Data Warehousing and of DW 2.0)

Consider this: Big Data in Context

21 Wednesday Jan 2015

Posted by Martyn Jones in Big Data, Consider this, Data Warehouse, Data Warehousing

≈ Leave a comment

Tags

Big Data, business intelligence, Core Statistics, DW 3.0, enterprise data warehousing, information management, information supply framework, statistics

Big Data, together with Cloud computing and the Internet of Things, are topics that are very much to the fore in contemporary trends in Information Management. Continue reading →

Consider this: Big Data and the Analytics Data Store

19 Monday Jan 2015

Posted by Martyn Jones in Analytics, Big Data, Consider this, statistics

≈ Leave a comment

Tags

Analytics, Big Data, Data Marts, enterprise data warehousing, statistics

To begin at the beginning

Hold this thought: If Data Warehousing was Tesco then Big Data would be the “try something different”.

Since the publication of the article Aligning Big Data, which basically laid out a draft view of DW 3.0 Information Supply Framework and placed Big Data within a larger framework, I have been asked on a number of occasions recently to go into a little more detail with regards to the Analytics Data Store (ADS) component. This is an initial response to those requests. Continue reading →

The World’s Best Data Quotes… Including Big Data quotes

17 Saturday Jan 2015

Posted by Martyn Jones in Analytics, Architecture, Big Data, Business Intelligence, Consider this, Data Warehousing, statistics

≈ 4 Comments

Tags

Analytics, aspiring tendencies in IM, Big Data, business intelligence, Core Statistics, enterprise data warehousing, Quotes

OLYMPUS DIGITAL CAMERA

A Random walk down Data Street

If you enjoy, abhor or are simply bored with the massive surfeit of hype surrounding Big Data, Data Warehousing and Analytics, then you might just hate these less than faithful quotes as well.

If you enjoy one or two of the quotes, well, then that’s an acceptable bonus too.

So, to begin at the beginning…xHound

Data Sources

“My data sources are unreliable, but their information is fascinating.” – Ashleigh Brilliant

“I give no data sources, because it is indifferent to me whether what data I have sourced has already been sourced before me by another.” – Ludwig Wittgenstein

“In the kitchen of a great Data Warehouse, the data source chef is a soloist.” – Fernand Point

“It is better to be hated for what data sources you have than to be loved for what data sources you do not have.” – André Gide

“In England, there are sixty different types of Data Warehouse and only one data source.” – Attributed to Voltaire

“It is a capital mistake to theorize before one has data sources. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” – Arthur Conan Doyle, Sherlock Holmes

“From such a gentle thing, from such a source of all data, my every pain is born.” –Michelangelo

“Noise free data is a source of great strength.” – Lao Tzu

“In three words I can sum up everything I’ve learned about data: it goes on.” – Robert Frost

“Data enrichment improves a mighty fine data source” – Anonymous

xButcherBig Data

“Junk food, empty calories and carbs are the Big Data of the masses” – Karl Marx

“We live, I regret to say, in an age of Big Data hype.” – Oscar Wilde

“We are not rich by the Big Data we possess but by what Big Data we can do without.” – Immanuel Kant

“He who has Big Data hype on his side has no need of proof.” – Theodor Adorno

“The religion of Big Data sets itself the goal of fulfilling man’s unattainable desires, but for that very reason ignores her attainable needs.” – Ludwig Feuerbach

“The flesh endures the storms of the present alone; the mind in our social network interactions, those of the past and future as well as the present. Big Data is a covetousness of the mind.” – Thomas Hobbes

“Big Data is negative and dialectical, because it resolves the determinations of the understanding of things into nothings.” – Georg Wilhelm Friedrich Hegel

“I am trapped in this Big Data, and there is nothing I can do about it.” – Dudley Moore

“And remember, never take the ruby case off your iPad for a moment, or you will be at the mercy of the Big Data Witch of the West.” – The Wizard of Oz

“Imagine there’s no Big Data…” – John Lennon

Abacus3Data Transformation

“Analysis does not transform data.” – Jiddu Krishnamurtu

“I live in a data landscape, which every single day of my life is enriching data.” – Daniel Day-Lewis

“Data opportunities multiply as the data is transformed” – Sun Tzu

“He who integrates data badly is lost.” – Theodor Adorno

“Today we transform the data; tomorrow, the whole enchilada” – Leon Trotsky

“Well, it’s all about the ETL law of the transformation of data quantity into data quality, and vice versa. Innit!” – Friedrich Engels

“The management consultants have only interpreted the business data, in various ways. The point, however, is to transform it.” – Karl Marx

“Hey! What’s going down here in the Hollyweird of data?” – Joe McCarthy

“The Big Data alchemists in their transformational search for gold discovered much data of greater value.” – Arthur Schopenhauer

“That Schopenhauer yolk was a bit of an old Big Data ‘procurer’ wasn’t he now Rodge?” – Pádraig Judas O’Leprosy

IMGQBusiness Intelligence

“The trouble with the world is that the cocksure have Big Data and that Data Science and Business Intelligence are all sexed up.” – Bertrand Russell

If people never did silly things no Business Intelligence would ever get done.” – Ludwig Wittgenstein

“The best Business Intelligence user is intelligent, well-educated and a little drunk.” –Alben W Barkley

“The Master said, “If your conduct is determined solely by considerations of Business Intelligence and profit you will arouse great resentment.” ― Confucius

“That’s cricket, Harry, you get these sort of things in Business Intelligence” – Frank Bruno

“Business Intelligence without ambition is a bird without wings.” – Salvador Dali

“I would prefer a Business Intelligence hell to a Big Data paradise.” – Blaise Pascal

“Many much-learned business men have no Business Intelligence.” – Democritus

“We should not only use the brains we have, but all that we can borrow.” – Woodrow Wilson

“The reason we have Business Intelligence is so we don’t have to think all the time” –Homer Simpson

P3160034Data Warehousing

“The study of Data Warehousing, like the Nile, begins in Inmon and ends in magnificence.” – Charles Caleb Colton

“Big Data wins games, but Data Warehousing wins championships.” – Michael Jordan

“Big Data is no substitute for Data Warehousing.” – Frank Herbert

“It’s in me blood, Clive, without Data Warehousing I’d be nothing,” – Alan Latchley

“The trouble with the world is that the cocksure have Big Data and that Data Science is all sexed up.” – Bertrand Russell

If people never did silly things no Business Intelligence would ever get done.” – Ludwig Wittgenstein

“The best Business Intelligence user is intelligent, well-educated and a little drunk.” –Alben W Barkley

“You can catch all the whales in the ocean and stack them together and they still do not make a minnow.” – Ralph Wiggum

“Well, the smarter I practice Inmon Data Warehousing, the luckier I get.” – Gary Player

“Well, I’ve cleaned up facts and dimensions in a star-schema ‘data warehouse’. That was pretty terrible. But I can’t complain because I’m sure other people have done worse.” – Cee Lo Green

“You can give a person a bowl of Big Data Gruel and feed them for a day, or teach them Inmon Data Warehousing and feed them for a lifetime.” – Proverb

“A Data Warehouse is like a tea bag; you never know how strong it is until you are in hot water.” – Eleanor Roosevelt

” οἶδα δ᾽ ἐγὼ ψάμμου τ᾽ ἀριθμὸν καὶ μέτρα θαλάσσης, καὶ κωφοῦ συνίημι, καὶ οὐ φωνεῦντος ἀκούω. ὀδμή μ᾽ ἐς φρένας ἦλθε κραταιρίνοιο χελώνης ἑψομένης ἐν χαλκῷ ἅμ᾽ ἀρνείοισι κρέεσσιν, ᾗ χαλκὸς μὲν ὑπέστρωται, χαλκὸν δ᾽ ἐπιέσται.” – An Oracle to Croesus of Lydia

IMGThat’s all folks!

Well, now that that’s done I can always ask for forgiveness. Not that I will of course.

 Many thanks for reading.

abfab111

Martyn Jones

Founder and CEO, Cambriano Energy


File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro

Continue reading →

Consider this: Did Big Data Kill The Statistician?

03 Wednesday Dec 2014

Posted by Martyn Jones in consider, Consider this, data science, statistics

≈ 20 Comments

Tags

Big Data, BS, Consider this, data analysts, data science, Data Warehouse, enterprise data warehousing, statisticians, statistics

OLYMPUS DIGITAL CAMERA

Blue sky data

Hold this thought: ‘There are big lies, damn big lies and big data science’.

Statistics is a science. Some argue that it is the oldest of sciences. It can be traced back in history to the days of Augustus Caesar, and before.

In 1998, Lynn Billard, in a paper that laid out the role of the Statistician and Statistics, wrote that “no science began until man mastered the concepts and arts of counting, measuring, and weighting”.[1]

Continue reading →

Artefacts for Enterprise Information Management

01 Monday Dec 2014

Posted by Martyn Jones in Ask Martyn

≈ Leave a comment

Tags

enterprise data warehousing, information management

Artefacts for Enterprise Information Management

Key Data Warehousing Reference Architecture Components

Martyn Richard Jones

As an effective business process paradigm and a powerful mixture of technology, engineering and pragmatic design, Enterprise Data Warehousing (EDW) is arguably unmatched in its capacity to address complex and essential information requirements in Management Reporting, Business Intelligence (BI) and Data Analytics. In spite of the occasional outbreak of anecdotal ruminations to the contrary, key indications point to a bright future for EDW. Indeed, the breadth and richness of nascent applications, that the EDW model can adequately support, will drive an expansion in its utility, acceptance and advancement. For instance, a significant and practical way of maximising the value of innovation and investment in Enterprise Data Warehousing is by reusing and extending the Inmon paradigm to construct the enterprise information hub of the future.

With more imagination in our perception and interpretation of the EDW paradigm, and a greater sense of purpose in the way we align data demand and supply factors with the underlying analysis, development and delivery mechanisms, we can reach a better understanding of how a variety of information requirements can be satisfied through innovative uses of the EDW model. With a broader vision of the role of EDW in organisations, we can identify additional business requirements that EDW is more than adequately capable of accommodating. Indeed, if we look at the broadened market of potential EDW clients we can encounter applications in domains such as next generation Customer Relationship Management (CRM), Master Data Management (MDM), Service Oriented Architectures (SOA), Business Performance Management (BPM), Customer Data Integration (CDI) and Advanced Data Correlation (ADC).

Now, in order for the EDW model to continue to be successful, and especially in the contemporary areas of applicability, it must meet these new and evolving requirements, simultaneously and visibly, whilst satisfactorily continuing to support its central role in the field of consistent, reliable and timely data analytics and reporting. In this respect, the EDW model provides the required coherency, flexibility and usability in a framework that permits the symbiotic coexistence of the iterative requirements of the traditional users of EDW (management reporting, departmental data marts, etc.) with the provisioning of data for new services. At the same time, this approach can satisfactorily address the requirements that are emerging from the disparate and pressing needs for low-latency and high-agility data integration in commercial and governmental organisations.

So, what does this mean in practical terms? Probably a simple way of summarising the approach is by stating that the best practices, principles and technologies of the Enterprise Data Warehousing paradigm have further applicability in effectively addressing a variety of new requirements, and in ways that help organisations obtain, retain and reinforce cohesion and coherence in enterprise information management.

Simply stated, you have this great approach for building data warehouses, but now you can use and extend this great approach to deliver requirements beyond the boundaries of traditional EDW. The inherent advantages accruable from this set of happy coincidences are necessarily tangible to a broad spectrum of the organisation, from EDW architect, EDW PM and EDW developer through to CIO, CFO and CEO.

This all very well and good in principle, but how can an organisation can kick-start its way into EDW futures? One practical way to ensure that the core framework for EDW will support certain classes of emerging applications – as well as continuing to support the existing and growing demands on the EDW’s core competence – is by leveraging the best principles and practices of EDW itself. To this end, the usefulness of the concept is most clear in four predominant groups of artefacts used to assist enterprises in the successful implementation and evolution of its EDW:

  1. Reference architectures
  2. Development methodologies
  3. Industry models and solution frameworks
  4. Construction components

The Reference Architecture

According to IBM, a reference architecture is, in essence, a predefined architectural pattern, or set of patterns, possibly partially or completely instantiated, designed, and proven for use in particular business and technical contexts, together with supporting artefacts to enable their use. Often, these artefacts are products of lessons learned from previous projects.

Alternatively, for companies such as IBM, Teradata and Oracle/Sun Microsystems, etc. reference architectures primarily consist of storage, servers and software. What is common to almost all reference architectures is that they are solutions that have worked satisfactorily in a given set of circumstances.

Reference architectures for Enterprise Data Warehousing should be comprehensive models of all required components and artefacts, and should explicitly state how the pieces of the architectural whole, fit together, how the data flows through the architecture and how the architecture sits in the current and future IT landscape of the organisation. What does this mean in practice? Think of a house, a comprehensive architecture should not stop at identifying rooms, functions of rooms, doors, windows, walls and the roof – this is good start, and helps us to get an idea of the house we are conceptualising, that is where its usefulness starts. Just as we would not want to build a house, simply be following a sketchy outline. The same goes for the data warehouse, and although at a trivial level a reference architecture looks like a series of boxes and arrows, it does not stop there.

Development methodologies

It is imperative that we learn to build the EDW using development methodologies specifically designed for iterative development project cycles. Indeed, trying to use a methodology, such as the System Development Lifecycle (SDLC), in an EDW project, is a damaging, costly and senseless exercise in vanity.

There are public domain iterative methodologies that we can use as the basis for our approach to EDW project iterations. Alternatively, it is possible to license data warehousing methodologies from a number of vendors and consulting service providers. One of the most comprehensive enterprise data warehousing methodologies available today is IBM’s Iterations process model. It has the best pedigree of all Data Warehousing methodologies. Bill Inmon’s company Prism Solutions released the first commercial version of Iterations to the market in the mid-nineties.

The key development phases, common to a number of methodologies, are:

  • Management – Ensures and plans for training, support, project management, change management, internal marketing and ongoing administration of the data warehouse or data mart
  • Analysis – The assessment, scoping and modelling of potential target database solutions, source system solutions, data availability, cleanliness and completeness. The provision of high-level technical recommendations
  • Design – Designing the data environment, data access environment, data extraction environment, maintenance-processing environment and the detailed technical environment.
  • Construction – The building and unit testing of the data extraction solutions, data access solutions, maintenance-processing solutions, the technical environment and the development of end-user training.
  • Testing – The performance of various levels of data warehouse testing, including systems, integration testing and user acceptance testing.
  • Implementation – The process of making the BI infrastructure accessible. The process of putting a developed iteration into the production environment.

So, what is the purpose of using development methodologies?

There are many explanation of the role of development methodologies, but one definition that is simple, clear and comprehensive defines it as as “a means of improving the management and control of the software development process, structuring and simplifying the process, and standardizing the development process and product by specifying activities to be done and techniques to be used”[1].

  • Shortening the time to realize return-on-investment
  • Minimizing the risk of project failure by getting it right the first time
  • Gaining executive sponsorship and endorsement early in the project
  • Setting realistic end-user expectations
  • Identifying critical resources
  • Providing a roadmap to simplify project management
  • Allowing the organization to focus on managing processes; not inventing them
  • Mitigating the impact of team attrition by providing clear and comprehensive documentation
  • Enabling efficient management reporting

Industry Data Models and Solution Frameworks

So, what is the purpose of EDW data models and EDW frameworks and templates? In addition, what are industry data models?

Industry models (such as the leading models from IBM) combine deep expertise and industry best practice in a usable ‘blueprint’ form for both business and IT communities to allow them to accelerate the requirements-to-execution development span of industry solutions. The best of these industry models have been using the experience of hundreds of implementations, and decades of development.

A sound way to appreciate the potential usefulness of an industry data model is by first understanding the core reasons behind its structures, the components and facets of the structures themselves and the logic of how it can be used to store, map, classify, relate, select and retrieve data.

To this end, vendors aim to provide “comprehensive data models containing data warehouse design models, business terminology model and analysis templates to accelerate the development of business intelligence applications. Second, best practice business process models with supportive service definitions for development of a service-oriented architecture (SOA)”. Which, when applied correctly, should result in the acceleration of Data Warehouse and Data Integration project iteration time-frames and also reduced risk.

Construction Components

We could consider Data Marts, an Operational Data Store and the Data Warehouse itself as construction components. However, this section is principally about the componentisation of Data Quality and ETL processes.

I do not intend to dwell too much on this aspect; primarily this is the facet out of all of those I have covered here, that most people will be familiar with and will be aware of the options available.

ETL

We can componentise a significant proportion of Data Warehouse development. Especially components related to ETL. Many of these components (which take the forms of widgets, patterns, templates or applets) are available from ETL vendors. Useful ETL templates and widgets are also available in the public domain.

ETL job stream or project components includes Source, Transformation, Lookup, Staging, Destination, and Loader components.

  • Source components – transport data for a transformation stream. An ETL job stream or project normally starts with one or more source components. Source components have at least one output port.
  • Transformation components, Lookup components, and Staging components – apply specific transformations to the data in the transformation stream. These types of components have both input ports and output ports.
  • Target components or Destination components (also called data sinks) – write data to specific targets. Target components have one input port and no output ports.
  • Loader components – extract and load data from a source database or file into the database, without performing any transformation. Oracle’s SQL*Loader was an example of a loader component.

Data Quality

In terms of Data Quality, there are a number of components available for cleaning and quality checking the veracity of data. For example, Data Quality Components for Microsoft’s SSIS is a suite of custom data cleansing transformation components for Microsoft SQL Server Integration Services (SSIS) to “standardize, verify, correct, consolidate and update all your contact data for effective communications”. Other similar component features are available from other Data Integration vendors, such as Informatica, Oracle, SAP, IBM and Talend.

One interesting offer that I would like to highlight are the InfoSphere Warehouse Packs from IBM, which cover aspects such as Customer Insight, Market and Campaign Insight, and, Supply Chain Insight.

Component Questions

What is the purpose of the construction components? 

They provide plug, configure, program and play options to speed development, testing, acceptance and deployment.

What are the EDW construction components?

They can be widgets, templates, patterns, apps or macros.

What is the purpose of components? 

Speed and ease development. Leverage reuse of work already done, tested and proved.

Conclusions

In summary, we will see that organisations that adopt, nurture and evolve the principles, practices and technologies contained in proven EDW reference architectures, industry data models and construction components, will come to benefit from an evolving EDW paradigm that is both useful and usable. In this way, the most responsive of organisations will be able to verify, for themselves, that the EDW paradigm has a great future as a determining factor in the iterative improvement of productivity, resilience and cost effectiveness in the provision of agile collaboration, usable innovation and process accuracy in the formulation of options and the execution of decisions.

What does that mean in plain English?

  • Commit to Data Warehousing only if there is a sustainable business imperative.
  • Stick to the core DW fundamentals as defined by Bill Inmon. This approach does not clash with the super-advanced business-oriented dimensional modelling of Ralph Kimball, or the Data Vault approach of Dan Linstead.
  • Use a proven iterative methodology specifically designed for Data Warehousing and Data Integration.
  • Leverage anything and everything that will help you to reliably, cost-effectively and coherently analyse, build, test and deliver your Data Warehouse iterations.
  • Use outsourcing and offshoring for purely non-Agile technical aspects of Data Warehousing described exhaustively and unambiguously in development documentation.
  • Continually test your Data Warehouse to ensure that you are not at risk of straying into the twilight zone of Information Management failure.
  • Data Warehousing, done right, will give IT a good name amongst business people.

About the author: Martyn Richard Jones is an Enterprise Data Warehousing (EDW) and Business Intelligence (BI) Project Manager and Architect with over 25 years experience in information integration, knowledge management and management information systems, gained primarily in the financial services and telecommunications business sectors. Martyn can be contacted by email: martyn.jones@cambriano.es 

[1] Nancy L. Russo, The Use and Adaption of System Development Methodologies


File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro

← Older posts
Follow GOOD STRATEGY on WordPress.com

Top posts

  • Myth-busting: Data Mesh and Data Warehousing - Revisited
  • Why I called bullshit on the data lakehouse nonsense
  • Data warehousing explained to big-data, data-lake & data-lakehouse folk
  • Agile at Scale is bullshit by design
  • Agile@Scale is Corporate Terrorism - Discuss
  • Data Warehousing means having thousands of ETL jobs
  • The data warehouse is the repository for the post-transactional data
  • UK Government? Global Charlies!
  • USA: What Trumped Hillary?
  • Does your way of providing data have business value?

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,439 other subscribers

Names in the cloud

4th generation Data Warehousing All Data Ask Martyn Big Data Big Data 7s Big Data Analytics Business Intelligence business strategy Consider this dark data data architecture Data governance Data Lake data management data science Data Supply Framework Data Warehouse Data Warehousing Good Strat goodstrat Good Strategy IT strategy Martyn does Martyn Jones Martyn Richard Jones pig data Politics Strategy The Amazing Big Data Challenge The Big Data Contrarians

The Good Strat Archives

  • March 2023
  • January 2022
  • December 2021
  • November 2021
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • July 2019
  • June 2019
  • May 2019
  • December 2018
  • January 2018
  • December 2017
  • October 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014

The Stats

  • 99,679 hits

Recent posts

  • You don’t need a data warehouse to do data warehousing March 22, 2023
  • Data Warehousing means having thousands of ETL jobs March 21, 2023
  • The data warehouse is the repository for the post-transactional data March 20, 2023
  • Does your way of providing data have business value? March 19, 2023
  • Data warehousing stands in the way of progress March 18, 2023
  • Data Trailblazers: 2022 Vision January 2, 2022
  • Tea with The Data Contrarian: Afilonius Rex December 10, 2021
  • Reality Check: Data Mesh and Data Warehousing   December 5, 2021
  • Myth-busting: Data Mesh and Data Warehousing – Revisited November 25, 2021
  • Heaven help us! Have you seen the latest Virtual Data Warehouse bullshit? June 26, 2020

Hours & Info

Martyn Richard Jones
Madrid, Spain
+33 767 120 160
10:00 - 17:00
Follow GOOD STRATEGY on WordPress.com

Follow me on Twitter

My Tweets

Top Good Strat Posts & Pages

  • The Good Strategy Company
  • Myth-busting: Data Mesh and Data Warehousing - Revisited
  • Why I called bullshit on the data lakehouse nonsense
  • Data warehousing explained to big-data, data-lake & data-lakehouse folk
  • Agile at Scale is bullshit by design
  • Agile@Scale is Corporate Terrorism - Discuss
  • Data Warehousing means having thousands of ETL jobs
  • The data warehouse is the repository for the post-transactional data
  • UK Government? Global Charlies!
  • USA: What Trumped Hillary?

Good strat tag cloud

accountability advertising All Data Analytics aspiring tendencies in IM awareness Banking Behavioural Economics BI Big Data Bill Inmon Brexit BS Business business analysis Business Enablement business intelligence Business Management business strategy Challenges Commercial IT Consider this corporate assets Corporate IT Creativity data data analytics data architecture data integration data management Data Marts data science Data Warehouse Demagogism Dogma DW 3.0 Economics enterprise data warehousing EU Financial Goal Setting goodstart good start Good Strat goodstrat Good Strategy hadoop Information and Technology information management Information Technology IT business IT Strategy knowledge management leadership marketforces Marketing Martyn Jones Martyn Richard Jones MDM Offshoring operationalwareness Organisational Autism organisational awareness Outsourcing Pimps Politics project management Requirements management Risk Risk Management statistics Strategy trading traditional assets UK

Categories

  • 4th generation Data Warehousing
  • accountability
  • advertising
  • agile
  • agile way of working
  • agile@scale
  • AI
  • All Data
  • Analytics
  • anthropology
  • Architecture
  • Artificial Intelligence
  • Ask Martyn
  • Assets
  • awareness
  • bad strategy
  • Banking
  • behaviour
  • Best principles
  • Big Data
  • Big Data 7s
  • Big Data Analytics
  • blockchain
  • Books with influence
  • Brexit
  • BS
  • business
  • Business Intelligence
  • business strategy
  • Cambriano
  • Cambridge Analytica
  • China
  • Climate Change
  • Cloud
  • code of conduct
  • Commercial Analytics
  • community
  • Condiser this
  • Conservative Party
  • consider
  • Consider this
  • Consultation
  • Creativity
  • dark data
  • data
  • data architecture
  • Data governance
  • data hub
  • Data Lake
  • data management
  • Data Mart
  • data mesh
  • data science
  • Data Supply Framework
  • Data Warehouse
  • Data Warehousing
  • deceit
  • deep learning
  • Democracy
  • digital transformation
  • Diplomacy
  • disinformation
  • Dogma
  • Duties
  • DW 3.0
  • ECM
  • Economics
  • EDW
  • England
  • enterprise content management
  • ethics
  • EU
  • Europe
  • European Union
  • Excellence
  • Excerpt
  • Executive
  • Extract
  • Federalism
  • Financial Industry
  • fraud
  • Freedoms
  • Globalisation
  • good start
  • Good Strat
  • Good Strategy
  • Good Strategy Radio
  • goodstart
  • goodstartegy
  • goodstrat
  • goostart
  • governance
  • hadoop
  • hdfs
  • HR
  • humour
  • India
  • influencers
  • informatio Supply Framework
  • information
  • Information Management
  • Information Supply Frameowrk
  • Information Supply Framework
  • Infotrends
  • Inmon
  • instruments
  • IoT
  • IT Circus
  • IT fraud
  • IT strategy
  • IT World
  • iterations
  • java
  • Knowledge
  • knowledge management
  • Labour Party
  • leadership
  • Leadership 7s
  • life
  • listening
  • literature
  • LSE
  • machine learning
  • Management
  • market forces
  • Marketing
  • Marty does
  • Martyn does
  • Martyn Jones
  • Martyn Richard Jones
  • media
  • Memory lane
  • Methodology
  • nationalism
  • nine competitive forces
  • no limits
  • Northern Ireland
  • obituary
  • Obligations
  • offshore
  • Offshoring
  • operational
  • Outsourcing
  • Oxford
  • pain
  • Parliament
  • Peeves
  • Personal Integrity Key
  • Philosophy
  • pig data
  • PIK
  • PIR
  • Plaid Cymru
  • Planning
  • poem
  • poems
  • Poetry
  • Polemic
  • political science
  • Politics
  • pomo
  • postmodern
  • POTUS
  • Process
  • Professional Networking
  • professionalism
  • project management
  • Project to Excel
  • prose
  • public
  • Public Integrity Record
  • Quiz
  • Rant
  • Referendum
  • Remain
  • RIghts
  • Risk
  • Rivalry
  • Russia
  • Ruth Davidson
  • Sales
  • satire
  • Scotland
  • Scottish National Party
  • scrum
  • sentiment analysis
  • SMILES
  • Snippet
  • SNP
  • Social
  • Social Media
  • Sociology
  • spoof
  • statistics
  • Stories
  • Strategy
  • structured intellectual capital
  • supply chain management
  • tactics
  • Tax avoidance
  • Tax evasion
  • TEAM
  • technology
  • The Amazing Big Data Challenge
  • The Big Data Contrarians
  • The Greens
  • The Guardian
  • The hidden wealth of nations
  • Trade
  • UK
  • Uncategorized
  • United Kingdom
  • USA
  • Value
  • Wales
  • wisdom

Blog at WordPress.com.

  • Follow Following
    • GOOD STRATEGY
    • Join 131 other followers
    • Already have a WordPress.com account? Log in now.
    • GOOD STRATEGY
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
    To find out more, including how to control cookies, see here: Cookie Policy