I have worked in data architecture and management for three decades, I have become a recognised expert in my field, and as a result I have become almost oblivious to the fads, fancies and fashions that pass through IT. However, being an expert in a field also means that from time to time we are oblivious to the difficulties that some people may have when trying to understand issues and concepts that we simply take for granted – because, one simply knows. This is the case with data.
With the polemic and resourcefulness surrounding buzz-words such as Big Data, Cloud and the Internet of Things, one may be forgiven for assuming that there has been a massive inflection point in the generation, variety, understanding and use of data. Now, this isn’t strictly accurate, as, outside of the handful of speculative and high-visibility projects of social media and networking, data collators and indexers, search engines and online volume ad-sellers, there has been scant publicised evidence of significant ‘data revolutions’ elsewhere.
Things haven’t changed significantly simply because a handful of companies are making money with other people’s data, rather than in the more traditional organisation, where their own data is one of the most important assets. So what’s really out there in the world of data? First, let’s look at some broad-brush classes of data, namely:
- Enterprise Operational Data
- Enterprise Process Data
- Enterprise Information Data
Enterprise Operational Data – This is data that is used in applications that support the day to day running of an organisation’s operations. Typical data items in this space are sales transactions, purchase transactions, product information, client and contact information. Enterprise Operational Data may also include complexly structured data, such as contracts and other business documents. Applications in this space may include production control, logistics and stock control, as well as purchase order, supply chain management, management accounting and human resource modules.
Enterprise Process Data – This is measurement and management data collected to show how the operational systems are performing. In the past the recording of events went down to the level of a completed transaction – with a start and an end and nothing in between, and as transactions were kept as simple as possible, to maximize performance and throughput and minimise the risk of failure, very little process data was captured. Now, especially with the advent of Business Process Management and Web Logs, we collect a whole array of transaction and process performance data that was never previously captured.
Enterprise Information Data – This is primarily data which is collected from internal and external data sources, the most significant source being typically Enterprise Operational Data. Other sources for this aspect of data include Enterprise Process Data and data provided by 3rd party data providers. In this data space we find Enterprise Data Warehousing, Operational Data Stores, Data Marts and Special Project Data Stores. Applications in this space include support for strategic and tactical decision making, formal statistical analysis, speculative ad-hoc analysis, data mining, business intelligence and reporting (also called Management Information reporting), and qualitative as well as quantitative analysis.
As we can see, there are interdependencies, synergies between each of these broad areas of data generation and use. Of course, data in each of these areas can be augmented and enriched by new sources of data, whether that data is richer market data, competitive data or weather data. Now, this is a very simplistic view of data, but for the purposes here it is both coherent and cohesive.
Fig. 1 – Data Made SImple
In the above diagram I have identified an area labelled as ‘Data Transitioning’. This is usually referred to as ETL (Extract, Transform and Load), although in more and more cases, instead of extracting data directly from source systems (Enterprise Operational Data Management) we are capturing data sent via enterprise message queueing, this drip feed approach in most cases allows us to maximise the time available for data loading and analysis.
Another important point to note is that although Enterprise Information Data Management has been associated with Relational Database platforms, such as Oracle, Teradata and DB/2, in this IT domain we are also using a wide range of ‘databases’, from the humble ‘flat file’ to powerful column oriented database engines, such as EXASOL, Teradata and Vertica to provide information and analysis to business stakeholders. As more and more ‘exotic’ data formats and sources are incorporated into our Enterprise Information Data platforms, we will also witness the evolution of tools, technologies and techniques to meet those new requirements.
Last of all, there will be no revolution in data management, because in most cases data architectures are built around sound data engineering principles, constrained and governed by the limitations of mainstream hardware, the operating systems that bring them to life and the competencies of those charged with designing, building or using them.
In subsequent blog pieces I will be sharing my views on the evolution of information management in general, and the incorporation of ‘Ad hoc Speculative-Predictive Analytics’ into well architected mainstream information supply frameworks for primarily strategic and tactical objectives.
File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro