Artefacts for Enterprise Information Management
Key Data Warehousing Reference Architecture Components
Martyn Richard Jones
As an effective business process paradigm and a powerful mixture of technology, engineering and pragmatic design, Enterprise Data Warehousing (EDW) is arguably unmatched in its capacity to address complex and essential information requirements in Management Reporting, Business Intelligence (BI) and Data Analytics. In spite of the occasional outbreak of anecdotal ruminations to the contrary, key indications point to a bright future for EDW. Indeed, the breadth and richness of nascent applications, that the EDW model can adequately support, will drive an expansion in its utility, acceptance and advancement. For instance, a significant and practical way of maximising the value of innovation and investment in Enterprise Data Warehousing is by reusing and extending the Inmon paradigm to construct the enterprise information hub of the future.
With more imagination in our perception and interpretation of the EDW paradigm, and a greater sense of purpose in the way we align data demand and supply factors with the underlying analysis, development and delivery mechanisms, we can reach a better understanding of how a variety of information requirements can be satisfied through innovative uses of the EDW model. With a broader vision of the role of EDW in organisations, we can identify additional business requirements that EDW is more than adequately capable of accommodating. Indeed, if we look at the broadened market of potential EDW clients we can encounter applications in domains such as next generation Customer Relationship Management (CRM), Master Data Management (MDM), Service Oriented Architectures (SOA), Business Performance Management (BPM), Customer Data Integration (CDI) and Advanced Data Correlation (ADC).
Now, in order for the EDW model to continue to be successful, and especially in the contemporary areas of applicability, it must meet these new and evolving requirements, simultaneously and visibly, whilst satisfactorily continuing to support its central role in the field of consistent, reliable and timely data analytics and reporting. In this respect, the EDW model provides the required coherency, flexibility and usability in a framework that permits the symbiotic coexistence of the iterative requirements of the traditional users of EDW (management reporting, departmental data marts, etc.) with the provisioning of data for new services. At the same time, this approach can satisfactorily address the requirements that are emerging from the disparate and pressing needs for low-latency and high-agility data integration in commercial and governmental organisations.
So, what does this mean in practical terms? Probably a simple way of summarising the approach is by stating that the best practices, principles and technologies of the Enterprise Data Warehousing paradigm have further applicability in effectively addressing a variety of new requirements, and in ways that help organisations obtain, retain and reinforce cohesion and coherence in enterprise information management.
Simply stated, you have this great approach for building data warehouses, but now you can use and extend this great approach to deliver requirements beyond the boundaries of traditional EDW. The inherent advantages accruable from this set of happy coincidences are necessarily tangible to a broad spectrum of the organisation, from EDW architect, EDW PM and EDW developer through to CIO, CFO and CEO.
This all very well and good in principle, but how can an organisation can kick-start its way into EDW futures? One practical way to ensure that the core framework for EDW will support certain classes of emerging applications – as well as continuing to support the existing and growing demands on the EDW’s core competence – is by leveraging the best principles and practices of EDW itself. To this end, the usefulness of the concept is most clear in four predominant groups of artefacts used to assist enterprises in the successful implementation and evolution of its EDW:
- Reference architectures
- Development methodologies
- Industry models and solution frameworks
- Construction components
The Reference Architecture
According to IBM, a reference architecture is, in essence, a predefined architectural pattern, or set of patterns, possibly partially or completely instantiated, designed, and proven for use in particular business and technical contexts, together with supporting artefacts to enable their use. Often, these artefacts are products of lessons learned from previous projects.
Alternatively, for companies such as IBM, Teradata and Oracle/Sun Microsystems, etc. reference architectures primarily consist of storage, servers and software. What is common to almost all reference architectures is that they are solutions that have worked satisfactorily in a given set of circumstances.
Reference architectures for Enterprise Data Warehousing should be comprehensive models of all required components and artefacts, and should explicitly state how the pieces of the architectural whole, fit together, how the data flows through the architecture and how the architecture sits in the current and future IT landscape of the organisation. What does this mean in practice? Think of a house, a comprehensive architecture should not stop at identifying rooms, functions of rooms, doors, windows, walls and the roof – this is good start, and helps us to get an idea of the house we are conceptualising, that is where its usefulness starts. Just as we would not want to build a house, simply be following a sketchy outline. The same goes for the data warehouse, and although at a trivial level a reference architecture looks like a series of boxes and arrows, it does not stop there.
It is imperative that we learn to build the EDW using development methodologies specifically designed for iterative development project cycles. Indeed, trying to use a methodology, such as the System Development Lifecycle (SDLC), in an EDW project, is a damaging, costly and senseless exercise in vanity.
There are public domain iterative methodologies that we can use as the basis for our approach to EDW project iterations. Alternatively, it is possible to license data warehousing methodologies from a number of vendors and consulting service providers. One of the most comprehensive enterprise data warehousing methodologies available today is IBM’s Iterations process model. It has the best pedigree of all Data Warehousing methodologies. Bill Inmon’s company Prism Solutions released the first commercial version of Iterations to the market in the mid-nineties.
The key development phases, common to a number of methodologies, are:
- Management – Ensures and plans for training, support, project management, change management, internal marketing and ongoing administration of the data warehouse or data mart
- Analysis – The assessment, scoping and modelling of potential target database solutions, source system solutions, data availability, cleanliness and completeness. The provision of high-level technical recommendations
- Design – Designing the data environment, data access environment, data extraction environment, maintenance-processing environment and the detailed technical environment.
- Construction – The building and unit testing of the data extraction solutions, data access solutions, maintenance-processing solutions, the technical environment and the development of end-user training.
- Testing – The performance of various levels of data warehouse testing, including systems, integration testing and user acceptance testing.
- Implementation – The process of making the BI infrastructure accessible. The process of putting a developed iteration into the production environment.
So, what is the purpose of using development methodologies?
There are many explanation of the role of development methodologies, but one definition that is simple, clear and comprehensive defines it as as “a means of improving the management and control of the software development process, structuring and simplifying the process, and standardizing the development process and product by specifying activities to be done and techniques to be used”.
- Shortening the time to realize return-on-investment
- Minimizing the risk of project failure by getting it right the first time
- Gaining executive sponsorship and endorsement early in the project
- Setting realistic end-user expectations
- Identifying critical resources
- Providing a roadmap to simplify project management
- Allowing the organization to focus on managing processes; not inventing them
- Mitigating the impact of team attrition by providing clear and comprehensive documentation
- Enabling efficient management reporting
Industry Data Models and Solution Frameworks
So, what is the purpose of EDW data models and EDW frameworks and templates? In addition, what are industry data models?
Industry models (such as the leading models from IBM) combine deep expertise and industry best practice in a usable ‘blueprint’ form for both business and IT communities to allow them to accelerate the requirements-to-execution development span of industry solutions. The best of these industry models have been using the experience of hundreds of implementations, and decades of development.
A sound way to appreciate the potential usefulness of an industry data model is by first understanding the core reasons behind its structures, the components and facets of the structures themselves and the logic of how it can be used to store, map, classify, relate, select and retrieve data.
To this end, vendors aim to provide “comprehensive data models containing data warehouse design models, business terminology model and analysis templates to accelerate the development of business intelligence applications. Second, best practice business process models with supportive service definitions for development of a service-oriented architecture (SOA)”. Which, when applied correctly, should result in the acceleration of Data Warehouse and Data Integration project iteration time-frames and also reduced risk.
We could consider Data Marts, an Operational Data Store and the Data Warehouse itself as construction components. However, this section is principally about the componentisation of Data Quality and ETL processes.
I do not intend to dwell too much on this aspect; primarily this is the facet out of all of those I have covered here, that most people will be familiar with and will be aware of the options available.
We can componentise a significant proportion of Data Warehouse development. Especially components related to ETL. Many of these components (which take the forms of widgets, patterns, templates or applets) are available from ETL vendors. Useful ETL templates and widgets are also available in the public domain.
ETL job stream or project components includes Source, Transformation, Lookup, Staging, Destination, and Loader components.
- Source components – transport data for a transformation stream. An ETL job stream or project normally starts with one or more source components. Source components have at least one output port.
- Transformation components, Lookup components, and Staging components – apply specific transformations to the data in the transformation stream. These types of components have both input ports and output ports.
- Target components or Destination components (also called data sinks) – write data to specific targets. Target components have one input port and no output ports.
- Loader components – extract and load data from a source database or file into the database, without performing any transformation. Oracle’s SQL*Loader was an example of a loader component.
In terms of Data Quality, there are a number of components available for cleaning and quality checking the veracity of data. For example, Data Quality Components for Microsoft’s SSIS is a suite of custom data cleansing transformation components for Microsoft SQL Server Integration Services (SSIS) to “standardize, verify, correct, consolidate and update all your contact data for effective communications”. Other similar component features are available from other Data Integration vendors, such as Informatica, Oracle, SAP, IBM and Talend.
One interesting offer that I would like to highlight are the InfoSphere Warehouse Packs from IBM, which cover aspects such as Customer Insight, Market and Campaign Insight, and, Supply Chain Insight.
What is the purpose of the construction components?
They provide plug, configure, program and play options to speed development, testing, acceptance and deployment.
What are the EDW construction components?
They can be widgets, templates, patterns, apps or macros.
What is the purpose of components?
Speed and ease development. Leverage reuse of work already done, tested and proved.
In summary, we will see that organisations that adopt, nurture and evolve the principles, practices and technologies contained in proven EDW reference architectures, industry data models and construction components, will come to benefit from an evolving EDW paradigm that is both useful and usable. In this way, the most responsive of organisations will be able to verify, for themselves, that the EDW paradigm has a great future as a determining factor in the iterative improvement of productivity, resilience and cost effectiveness in the provision of agile collaboration, usable innovation and process accuracy in the formulation of options and the execution of decisions.
What does that mean in plain English?
- Commit to Data Warehousing only if there is a sustainable business imperative.
- Stick to the core DW fundamentals as defined by Bill Inmon. This approach does not clash with the super-advanced business-oriented dimensional modelling of Ralph Kimball, or the Data Vault approach of Dan Linstead.
- Use a proven iterative methodology specifically designed for Data Warehousing and Data Integration.
- Leverage anything and everything that will help you to reliably, cost-effectively and coherently analyse, build, test and deliver your Data Warehouse iterations.
- Use outsourcing and offshoring for purely non-Agile technical aspects of Data Warehousing described exhaustively and unambiguously in development documentation.
- Continually test your Data Warehouse to ensure that you are not at risk of straying into the twilight zone of Information Management failure.
- Data Warehousing, done right, will give IT a good name amongst business people.
About the author: Martyn Richard Jones is an Enterprise Data Warehousing (EDW) and Business Intelligence (BI) Project Manager and Architect with over 25 years experience in information integration, knowledge management and management information systems, gained primarily in the financial services and telecommunications business sectors. Martyn can be contacted by email: email@example.com
 Nancy L. Russo, The Use and Adaption of System Development Methodologies
File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro