To begin at the beginning
This is the first in a series of collections of talking points on the processing of very large data sets by non-relational or pseudo-relational means, speculative data analytics with these large data sets which is typically non-operational data and social media data obtained from internet sources, and how usable outcomes, if any are derived, can be integrated into strategic, tactical and operational decision support.
Currently this area is parked under the misleadingly named ‘Big Data’ umbrella, but in the near future I predict that this niche will be merged into the more recognisable and business oriented areas of data warehousing, data architecture and business intelligence, and rebadged to avoid even further confusion.
Each number of this series will be addressing 7 talking points.
Here are the first seven talking points that deal with aspects of primal mass data processing, speculative analytics and outcome and result persistence and association.
Keep it simple
The leading and continuous mantra for all ‘Big Data’ initiatives should be simplicity.
Simplicity means identifying a well -bounded speculative opportunity and then focussing on it, whilst not allowing for scope creep until the work is done and a following iteration is defined.
Simplicity means taking the data that is needed, along with the useless baggage data that it is unfortunately bundled with, and then reducing the data to the essentials at the earliest possible moment.
Simplicity means trying to move the data reduction problem up stream, preferably to the point where it is actually generated and stored.
Simplicity means not flannelling business people about the supposed benefits of ‘Big Data’. It means about avoiding patronising language akin to “Just do Big Data, because everyone will have to be doing it, and don’t worry your pretty little head about what it’s actually doing under the bonnet”. It means being frank, open and earnest about ‘Big Data’.
Hold this thought: You cannot bullshit simplicity.
Appropriate is good
The great economist John Kenneth Galbraith once observed that “The real accomplishment of modern science and technology consists in taking ordinary men, informing them narrowly and deeply and then, through appropriate organization, arranging to have their knowledge combined with that of other specialized but equally ordinary men. This dispenses with the need for genius. The resulting performance, though less inspiring, is far more predictable.”
Appropriateness is one of the more important aspects of supplying data for strategic, tactical and operational decision support, and it is data that must by its very nature be appropriate.
Appropriateness addresses the need for the right data.
Hold this thought: Appropriateness is good
Adequate is sufficient
Another important aspect of Adequacy means that there is enough data supplied to adequately meet the requirements for that data. Adequacy addresses the need for the right volume of the right data and at the right levels of abstraction.
I know that people in IT find it tempting to second guess requirements and to pile up unasked for feature additions like they were going out of fashion, but in the lean and iterative age of agile we can no longer afford to be so reckless in how we manage requirements, projects and resources, especially those assigned to ‘Big Data’ projects.
Just hold this thought: Adequate really is enough
Timeliness kills the competition
Another important aspect of this Big Data field is found in the timely provision of data and the fast delivery of usable outcomes. But this not only requires ‘Big Data’ but also big data management smarts.
Timeliness addresses the need to get appropriate and adequate data to decision makers on time and every time, in order to maximise the possibilities for its use and therefore to increase the chances of it having some business value.
Hold this thought: Speed kills the competition.
Integration makes sense
If after running speculative analysis (diagnostic or predictive, etc.) and you are lucky enough to actually end up with something tangible and useful, you may also want to consider linking this or integrating the outcomes into mainstream and quality assured strategic and tactical decision support and analysis data.
This is where the Data Warehouse concept of Bill Inmon comes into its own. Because Enterprise Data Warehousing (and especially DW 3.0) provides a conceptual data architecture and data management protocols to support the adequate, appropriate and timely scaling of data set sizes from gigabytes to terabytes and then to petabytes – and beyond, if that is really what is needed.
Hold this thought: Integrate without losing essence
Big Data Science name change
There has been so much misleading, unreliable and unrepresentative puff built up around Big Data that it seems like an appropriate time to give it a ‘legal, decent and honest’ makeover, and to also change its name to something more appropriate such as Janus Data Analytics (JDA for short) or New Wave Punk Data.
I believe that Janus Data Analytics may be a good name for this niche technology field because it accurately reflects what it is and at the same time it is intrinsically linked to beginnings and transitions, to gates, doors, doorways, passages and endings. Janus Data Analytics looks into the future and into the past, and presides over the beginning and ending of conflict, war and peace.
There is also a certain attraction in the term New Wave Punk Data. It sends a strong and uncompromising signal to business. It deftly and simply describes the two key aspects of what is being currently touted as ‘Big Data’. New Wave Punk Data reflects the rapid, sharp edged and primal slicing, dicing and reduction of very large data sets, together with short term speculation, stripped-down analytics, with often opinionated and alternative drivers. It embraces a DIY ethic; many businesses that lead the movement (Yahoo, Google, Facebook, etc.) started with self-developed ‘Big Data’ tools (often initially as simple variations on the Unix power-chord themes of parallel grep, awk and cat) and shared them through open source channels.
The third option is to simply place the data aspect of ‘Big Data’ under the data architecture and data management umbrella as a facet of Data Warehousing and to place the ‘data science’ aspect of ‘Big Data’ under the statistics and data analytics umbrella, with a close association with the sub-class known as business intelligence. The true data mining and machine learning aspects of ‘Big Data’ can sensibly continue under the umbrella of Artificial Intelligence.
Hold this thought: A rose by any other name
Keep it legal, decent and honest
Potentially there are methods, technologies and techniques under the ‘Big Data’ big-top that could be used to accrue real business value; however, those benefits are being put at risk by the quality and quantity of puff in the environment, which was alluded to in the previous talking point.
The point is this. Banging on about the same nebulous futures of ‘Big Data’ rather than being specific, clear and verifiable about what is really going on is, to state it simply, is going to ‘queer the pitch’ for everyone; the good, the bad and the ugly… but especially the good.
Therefore I would suggest that we all take an additional New Year’s resolution on ‘Big Data’, and in future only refer to the application and benefits of ‘Big Data’ and ‘Big Data’ analytics in terms that could only be construed as legal, decent and honest.
Hold this thought: “If you are not a better person tomorrow than you are today, what need have you for a tomorrow?” – Rebbe Nachman of Breslov
That’s all folks
So, that is all from me in the first of what I hope will be many issues in the series Big Data 7s.
I would like to leave you with this fabulous quote from James Carville… just because.
“Sometimes the right thing gets done for the wrong reason and sometimes, unfortunately, the wrong thing gets done for the right reason”.
As always, many thanks for reading.
File under: Good Strat, Good Strategy, Martyn Richard Jones, Martyn Jones, Cambriano Energy, Iniciativa Consulting, Iniciativa para Data Warehouse, Tiki Taka Pro