A data warehousing superhero is something to be
Not all that glitters is Big Data, and Big Data has a long way to go before it can deliver anything like the same satisfying results, tangible benefits and organisational agility that a properly implemented Inmon Enterprise Data Warehouse can provide.
Therefore, I have a question for you.
Do you want to win friend and influence people in the world of data architecture and management? Do you want to do something in IT that atypically will bring kudos and credibility? Do you want to enjoy what you are doing because you are actually doing the right thing right for an appreciative audience?
Okay, this a recipe that I will now reveal, has the power to turn you into, not only a data hero, but a 4th generation enterprise data warehousing superhero – with Big Data bells and whistles attached, and even more amazingly, it is offered for nothing, gratis, and for keeps.
Yes, you read it right. I am feeling generous, and although a rare animal, there is such a thing as a free lunch. In this instance, the free lunch takes the form of a cookbook for successful data sourcing, warehousing and provisioning, one that will turn you into a truly modern day digital superhero.
Follow the suggestions to the letter and it will be hard to fail. However, drop any magic ingredient from the mix and expect, eventually, to run out of luck – that is rhyming slang for Donald Duck, down my way. Almost as important, please apply your own criteria of good sense at every step of the way.
The craft of data
The craft of data includes temporary-permanence in exploitation, revolution and institution.
When Sun Tzu was talking about the Art of War, he was also talking about the craft of data.
In the 21st century the highest expression of the craft of data in an organisation, whether public, private or military, is the enterprise data warehouse.
These are some of the key rules and guidelines for ensuring that you prevail and not your adversaries. The items are necessarily terse, but should provide a sound basis for further research, thought and strategic practice.
So without further ado, let us get to the crux of the matter.
- This is the first piece of advice, and it’s a little bit of a ‘downer’, but you may just thank me for it later. The business sponsor of any significant Data Warehouse initiative or iteration cannot be the CIO, CTO or any member of the IT organisation. When this unfortunately happens, and it happens far too often, you should know that this particular data warehouse project is dead before it even gets off the ground – guaranteed. If you can afford to walk away from such a project, then do so. Now for the more positive aspects.
- All data in the data warehouse must be subject-oriented.
- We must integrate all data before it enters into the data warehouse.
- All data in the data warehouse must be time-variant or specifically indeterminate.
- Data in the data warehouse must be non-volatile – within periods of explicit and implicit snapshot coverage.
- Data in the data warehouse is primarily used to feed into management decision making (by order of importance: strategic, tactical and then operational).
- We build the data warehouse iteratively and over time. We never build the data warehouse using a ‘big bang’ approach.
- We base each build iteration of the data warehouse on a specific set of well-bound departmental-oriented requirements, deliverable in a short and specific timeframe. We never try to build the data warehouse using a ‘boil the ocean’ approach.
- We never run more concurrent iterative developments in a data warehouse programme than we would in any other agile environment. This means that for a mature data warehousing setup, we run a maximum of five concurrent developments. The more immature the organisation, the less the number of concurrent iterations.
- We use a contemporary two-tier approach to the data-warehousing super-component. A well architected, designed and engineered third-normal form database that supports true historicity and time-variance-modelling forms the basis of the decision support database of record.
- We build departmental and process-centric data marts on top of the data warehouse layer, as the end-user-centric semantic-layer of the data warehouse.
- We use 3NF to model the data warehouse data-model. We typically use dimensional modelling to model the data mart models, although other modelling options are also valid. Target use cases will inform the decisions we make regarding the choice of data mart model.
- Never trust anyone who claims that we can service the strategic data needs of a complex and volatile enterprise by implementing a faux data warehouse built using a collection of conformed dimensions and facts. This approach may initially appear to work, however, this is a massive strategic, tactical and operational mistake, which will eventually involve costly reengineering, loss of valuable data, organisational disruption and dissatisfied clients.
- We store transaction in the data warehouse at the lowest possible level of granularity. We store transaction and fact data in the data marts at the aggregation levels appropriate to the target audience.
- Based on use cases and performance needs, we will accordingly aggregate data in the data marts. If, in the future, lower level data granularity is required in the data mart then we can easily provide that by reconstructing the data mart from atomic level data stored in the data warehouse.
- We should never second-guess business requirements. No business imperatives means no requirement. You’re aiming to be a successful data superhero, keep that goal in mind. Don’t be beguiled into doing the wrong things even when accosted by ‘right-sounding reasons’.
- Data warehousing is about the permanent incremental development and redefinition of minimum viable products and a minimum viable service. Iteratively grow the data warehouse and ignore those who claim that Inmon is about ‘big bang’, ‘bottom up’ and ‘boil the ocean’.
- Avoid pork barrel political games in data warehouse programmes. You should not use a data warehouse programme as a means to leverage a raft of other related data, operational and DevOps projects in the organisation. For example, Corporate Data Governance, Data Quality and Disaster Recovery/Business Continuity should not packed into the data warehousing programmes, at any level. Again, this is a massive strategic, tactical and operational mistake.
- We ensure that as a minimum that data in the data warehouse is as reliable as the data at source. Simply stated, we do not allow unnecessary entropy to effect the data in the journey from source systems to the target data warehouse or data marts.
- No data is ‘corrected’ or ‘cleaned’ in the data warehouse without the explicit, verifiable and express consent of the fiduciary duty holder with respect to that data. If the data warehouse is to act as a system of record then it must also hold metadata relative to any ‘cleaning’ that has been applied to that data, and should also hold ‘before’ and ‘after’ states of corrected data – for auditing purposes.
- We secure all data in the data warehouse in accordance with prevailing legislation and corporate rules and guidelines. In any conflict between corporate rule and legal jurisdiction, the current laws prevail.
- Ensure that competent and independent design authorities, with the support of the Data Warehouse architect, are ultimately responsible for all data-warehouse architectural, process and design decisions.
- Architectural and process choices govern the selection of methodology, product and partner. Always remember mens sana in corpore sano. Prejudice, speculation and opinion generally lead to very bad data-warehouse acquisition decisions, and can potentially lead to strategic, tactical and operational mistakes.
- Data warehousing iterations have clear top-level phases: start-up; DW management phase; analysis phase; design phase; build phase; testing phase; and, implementation phase. We complement these phases with data warehousing tracks: project management track; user track and requirements; data track; technical track; and, metadata track. This approach is used by a number of data warehousing methodologies, including the Cambriano methodology for data warehousing, information management and data integration.
- To conclude, I would like to iterate some of the reasons why we should follow an Inmon based approach to the building of a Data Warehouse. The Inmon approach is very much based on:
Iteratively solving specific business challenges, iteration by iteration. This is not just a flippant excuse for spending other peoples’ money. The Inmon DW is not about ‘boiling the ocean’, ‘bottom up’ or ‘big bang’. Neither is it an insistence that one can build a whale by carefully configuring a collection of minnows. There’s a ‘little bit more’ to it than that.
Delivering perceived and visible value within a reasonable timeframe.
Achieving high returns on investment.
Meeting or exceeding expectations.
Meeting user requirements, first time and every time.
Delivering a quality data-warehouse solution on schedule, within budget, whilst effectively utilizing the resources available.
The rational and economic need to minimize the impact that any strategic data initiative will have on operational systems and the organisation.
The goal of maximizing information availability and analytical capabilities throughout the organisation and even to stakeholders and clients, if we so wish.
Designing towards maximum flexibility to ensure that we can accommodate much of the future decision support needs immediately and that we swiftly and coherently address new requirements.
Now I’ve given out a wealth of valuable information and indications you may be asking ‘and now what?’
This is the next step, dear budding data superhero:
- Take each of the items mentioned above and study them to the best of your ability. Do lots of research, and start to fit together the pieces of the jigsaw.
- Invent scenarios, or better still, ask other people for scenarios and hypothetical challenges, and then work through how you would go about responding to those scenarios and challenges.
- If you have any questions that you cannot research and answer yourself, then I will be glad to help. That is, if the request is regarding a particular aspect of data warehousing or management. Please email me your questions at firstname.lastname@example.org Please use one email shot per question please (e.g. if you have three questions, send three emails), so that I can prioritise the questions and manage the time I can set aside to respond to them.
The subtle evolution of Inmon’s definitive Data Warehousing
What I have described are elements and requisites of a solid, coherent and cohesive approach to fourth generation Enterprise Data Warehousing, a proven approach to the provision of quality data for management decision support. The approach is the evolution of the classic Inmon approach, which has evolved over the intervening decades, thanks to Bill Inmon himself, and those who adopted and developed his approach to cohesive, coherent and comprehensive data warehousing.
Many thanks for reading
So, that’s it. Many thanks for reading this piece and I sincerely hope you found it of interest.
Do keep in touch. You can connect with me via LinkedIn and you can also keep up to date with my activities on Twitter (User handle @GoodStratTweet) and on my personal blog http://www.goodstrat.com (GoodStrat.com)
I am the manager of The Big Data Contrarians group on LinkedIn. Consider joining that group, if only for the critical thinking that it could potentially provoke.
You may also be interested in some other articles I have written on the subject of Data Warehousing.
Data Warehousing explained to Big Data friends – https://goodstrat.com/2015/07/20/data-warehousing-explained-to-big-data-friends/
Stuff a great data architect should know – https://goodstrat.com/2015/08/16/stuff-a-great-data-architect-should-know-how-to-be-a-professional-expert/
Big Data is not Data Warehousing – https://goodstrat.com/2015/03/06/consider-this-big-data-is-not-data-warehousing/
What can data warehousing do for us now – http://www.computerworld.com/article/3006473/big-data/what-can-data-warehousing-do-for-us-now.html
Looking for your most valuable data? Follow the money – http://www.computerworld.com/article/2982352/big-data/looking-for-your-most-valuable-data-follow-the-money.html
Martyn Richard Jones
Palma de Mallorca
23rd September 2015