, , , , , , ,

Stuff a really great Data Architect should know

If you enjoy this piece or find it useful then please consider joining The Big Data Contrarians:

Join The Big Data Contrarians here: https://www.linkedin.com/grp/home?gid=8338976

Many thanks.

As a child, I had a great love of stories of Spain, of the idea of travelling through the Iberian Peninsula and of mastering, and not just learning, the classical Spanish guitar. One of the phrases that stuck with me from those days was the in underivable quote of “amateurs practice until they get it right; professionals practice until they can’t get it wrong.”

In my professional working life, I have striven to identify those things that I want to be sufficiently competent at doing and those things that I consider a fundamental part of my professional competence, and then in making a clear distinction between the two.

As many of those who know me will know, a significant part of my professional life has been dedicated to the architecture and management of data, information and structured intellectual capital.  Therefore, in the light of this fact and with reference to the previous bit of whimsical fancy, I will address the following question posed to me some time ago: What makes a great Data Architect, truly great?

What follows is by no means an exhaustive list of essential elements, but it should give you a flavour of what a great Data Architect is.

ONE – Establish a clear, cohesive and communicable idea of the theoretical, technical, philosophical and practical nature of data and information. Learn it inside, out, upside down and back to front. Then learn it well.

Put it another way. A great Data Architect should be able to answer the question “what is data?” from almost any viewpoint and then be able to give a simple, precise and understandable reply.

The internet abounds with content on ‘data’ and ‘information’. You may even be familiar with the way Wikipedia describes ‘data’, you may even agree with it, even though it is (in its current form – 3/8/2015[1]) a naïve, sloppy and circular definition. Which only serves as an example of how not to define data.

TWO – Know your audiences, understand their motivations, have empathy with them, and develop a keen ability to spot what the audience wants and then sell that back to them as if it were their own idea.

One of the greatest architects of the 20th century, Ludwig Mies van der Rohe, had this to say about the relationship between an architect and a client: “Never talk to a client about architecture. Talk to him about his children. That is simply good politics. He will not understand what you have to say about architecture most of the time. An architect of ability should be able to tell a client what he wants. Most of the time a client never knows what he wants.”

THREE – Learn to communicate clearly, simply and effectively, and remember who the most important members of your audience are at any given moment and speak mainly to them.

The mantra of ‘keep it simple’ is what separates a great Data Architect from the swathe of sycophantic worriers, software train-spotters and smart-ass wannabes that make up much of the world of IT. So do not even try to appeal to that segment, they don’t matter. Speak to those that do matter.

The job of the Data Architect is not to impress his colleagues, get likes on Facebook or to be the manager’s pet. A great Data Architect uses language that is appropriate for the occasion, not to flout their extensive knowledge and experience but to communicate ideas, concepts and architectures in the language and manner that the listener can immediately grasp. A Data Architect who aspires to greatness does not need to prove themselves to his or her peers, but just needs to strive to be a true professional and the greatness will come along in its own good time.

The eighteenth century English theologian, dissenter, philosopher and scientist Joseph Priestly wrote, “The more elaborate our means of communication, the less we communicate”. With such influences in mind I try to encourage my team members and other collaborators to use appropriate channels of communication, and one of the ways I use to this message across is with a list of options. I find that doing this early on can help to really simplify things and bring a greater degree of clarity to the table. However, as with many other aspects of life, with this approach one too has to be flexible and realistic, and allow for the election of the most appropriate option according to the circumstances. My preference list is:

  1. Face-to-face
  2. Video conference
  3. Telephone/mobile
  4. Post-it note – or similar
  5. Texting/SMS/Wassup
  6. Email
  7. Smoke signals
  8. Social Media

FOUR – Be a great listener. Data Architects must nurture and hone effective listening skills; otherwise, they place themselves at a serious disadvantage.

Here are the four listening aspects that a Data Architect should aspire to dominate:

  1. Cultivate a self-awareness of the importance of listening.
  2. Understand what barriers there are and learn how to overcome the barriers to listening.
  3. Identify poor listening habits and practices that you have adopted – ask people about how they see your listening skills.
  4. Improve your own responsive listening skills.
  5. Take this as an open-ended continuous improvement programme.

Put it this way, as a leader you might be the most amazing talker this side of the Rockies, but if you can’t listen effectively then it would be like Nadal, Federer or Djokovic, having a great world-class tennis serve, but with a cultivated inability to accurately read the play or to return any difficult shot.

FIVE – Understand how data is generated; why it is generated; who or what triggers the generation; how it flows; how it is used; who uses it and why. Understand the life-cycles of data and information.

A great Data Architect must understand the public and private life of data before actually trying to do anything with it.

I’ll cut to the chase on this topic and leave you with a comment on The Social Life of Information by John Seely Brown and Paul Duguid.

“To see the future we can build with information technology, we must look beyond mere information to the social context that creates and gives meaning to it. For years, pundits have predicted that information technology will obliterate the need for almost everything—from travel to supermarkets to business organizations to social life itself. Individual users, however, tend to be more sceptical. Beaten down by info-glut and exasperated by computer systems fraught with software crashes, viruses, and unintelligible error messages, they find it hard to get a fix on the true potential of the digital revolution.”

That’s just another indication of what we have to learn to avoid.

On the up side, “The Social Life of Information gives us an optimistic look beyond the simplicities of information and individuals. It shows how a better understanding of the contribution that communities, organizations, and institutions make to learning, working and innovating can lead to the richest possible use of technology in our work and everyday lives.”

SIX – Get a great understanding of all the data oriented vices and bad data architecture practice that goes on in the IT application world, and most especially in the web application-development world.

Some of the most atrocious examples of bad data architecture, engineering and management are in web applications. Learn from them, and learn how not to repeat such gross and wilful incompetence in your own Data Architecture work. Look at it as extreme examples of lessons learned. I.e. How not to do it.

SEVEN – Cultivate a well-developed sixth sense for the appreciation of the intrinsic values of data and information.

No, I am not arguing the case for the idea that all data has value, that extreme notion is clearly absurd, but fortunately one that has limited adherence. However, I am saying that we should develop a ‘nose’ for understanding what data could be of value, and measuring in qualitative and pseudo-quantitative terms, what that value actually represents.

I would also encourage people to check-out the Wikipedia piece on Infonomics ( URL: https://en.wikipedia.org/wiki/Infonomics) a termed coined by Gartner’s Doug Laney, and based at work carried out at Bill Inmon’s Prism Solutions, which incidentally is one of my former employees.

Here’s a snippet:

“Infonomics is the theory, study and discipline of asserting economic significance to information. It provides the framework for businesses to value, manage and wield information as a real asset. Infonomics endeavors to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.”

When you are a Data Architect, you should really be aware of such stuff, and at least be able to carry out a reasonable conversation about it.

EIGHT – Strive to be the best of all data modellers you are ever going to meet in your entire life.

I say that I’m a lean data modeller. What does that mean?

The first thing I model are the data flows.

Then I will create the conceptual, logical and physical models.

Then I will repeat until I get consensus, or until I become the Data Dictator – this usually occurs when the Portfolio Director demands closure and delivery.


Nevertheless, not so fast. You will also need to know how to design physical data models for OLTP as well as for Enterprise Data Warehousing, and no, they are not the same, even if they are similar in many aspects.

Not only will a great Data Architect have polished skills in the art of data modelling according to the divine tenets of Codd and Date and later extended by blasphemers and acolytes alike, but they must also be comfortable designing dimensional models.

Some other models that will separate the competent from the great Data Architect would be working knowledge of the Hierarchical database model; the Network model; the Object model; the Document model; and the Entity–attribute–value model. It would also be of interests to have a passing acquaintance with the Inverted index; flat file usage; the Associative model; the Multidimensional model; the Multivalue model; the Semantic model; the XML database; the Named graph; and, Triplestore.  Knowing stuff about stuff like this is where the killer skills differentiator comes into play.

I have been fortunate in that I can name some of the greatest data people of all times, as my own personal mentors, and I appreciate that for many, well everyone now, that this is not an option. However, there are ways and ways.

There is some great material out there about data modelling; unfortunately, there is an awful lot of crap as well. If you unsure how to differentiate, then ask an expert. There are a number of data experts commenting on the data related groups on LinkedIn.

In the old days it was quite easy to spot a data pro – slightly dishevelled look, tweed jacket, patches on the sleeves and a pipe, matches and tobacco in one of the pockets, Doctor Watson style, etc. but now in the virtual and aseptic worlds, it’s not so obvious who is who. What a pity those days have past, but such is life.

Lastly, consider this quote from Ove Arup. “Engineering is not a science. Science studies particular events to find general laws. Engineering design makes use of the laws to solve particular practical problems. In this it is more closely related to art or craft.”

NINE – Understand the database and data related technologies and products out there, and the pros and cons of using them. Also, strive to be technology agnostic.

This is probably the one aspect of the life of the Data Architect that most people will be familiar with… the tools and technologies. Probably for this reason alone there are recruiting agencies that cannot tell the difference between a technology product and the entire vast field of data architecture and management, or the differing importance of knowing the version of a piece of software and the knowing how to competently manage the Data Architecture of a global business.

Nonetheless, it’s good to have a grasp of the vast array of data related technologies and products out there, and to keep that knowledge as up to date as possible.

Therefore, this list is more for the aspiring Data Architect rather than for the experimented professional. Nevertheless, make sure you have a handle on these:

  1. RDBMS products – Oracle, Teradata, DB/2, SQL/Server, Sybase, EXASol, etc. A more exhaustive list can be found here on Wikipedia: https://en.wikipedia.org/wiki/List_of_relational_database_management_systems
  2. Distributed File System and querying products – such as HDFS, Lustre and GPFS. This is the Wikipedia list: https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems see also https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems
  3. ETL and Enterprise Data Integration technologies – https://en.wikipedia.org/wiki/Extract,_transform,_load
  4. Master Data Management – https://en.wikipedia.org/wiki/Master_data_management

Please also note that there is a surfeit of data products in addition to those mentioned or referenced above.

TEN – Absolutely dominate the subject of Data Governance. Make Data Governance one of your master subjects, and be ready to bring it into play at a moment’s notice.

Take heed of the wise words of Sun Tzu: “If you know your enemies and know yourself, you will not be imperiled in a hundred battles… if you do not know your enemies nor yourself, you will be imperiled in every single battle.”

The DAMA Dictionary of Data Management defines Data Governance as “The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets.”  DAMA has identified 10 major functions of Data Management in the DAMA-DMBOK (Data Management Body of Knowledge). Data Governance is identified as the core component of Data Management, tying together the other 9 disciplines, such as Data Architecture Management, Data Quality Management, Reference & Master Data Management, etc., as shown in the following diagram:

Whilst we are at it, I would encourage everyone interested with a professional interest in Data Architecture to check out ‘Data Architecture: A Primer for the Data Scientist’. This is a bit of blurb from the Amazon site:
“Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.”

That’s all folks

Now, the clock on the wall is really telling me that I should wrap up this baby, warts and all, accidents, omission and typos included, and put it to bed.

This is far from being an exhaustive list of the things which a Data Architect should cultivate, hone and excel in. And yes, I know I ‘missed a bit, there’ as well. And yes, I know I started a new sentence with an ‘And’, and, and, and. And yes I… But, anyways… hey ho! upwards and onwards!

Nonetheless, I hope this little piece was informative or entertaining, or even both. At some level of abstraction or another.

If you spot any glaring errors in this piece then please let me know in the comments section below and I will revise as necessary. Thanks in advance for that.

I will leave you with the words of one of my favourite contemporary architects, Zaha Hadid**:

“I started out trying to create buildings that would sparkle like isolated jewels; now I want them to connect, to form a new kind of landscape, to flow together with contemporary cities and the lives of their peoples”

Many thanks for reading.

In subsequent blog pieces I will be sharing my views on the evolution of information management in general, and the incorporation novel and innovative techniques, technologies and methods into well architected mainstream information supply frameworks, for primarily strategic and tactical objectives.

As always, please reach out and share your questions, views and criticisms on this piece using the comment box below. I frequently write about strategy, organisational, leadership and information technology topics, trends and tendencies. You are more than welcome to keep up with my posts by clicking the ‘Follow’ link and perhaps you will even consider sending me a LinkedIn invite if you feel our data interests coincide. Also feel free to connect via TwitterFacebook and the Cambriano Energy website.

For more on this and other topics, check out some of my other posts:

Big Data, the promised land where ‘smart’ is the new doh!https://www.linkedin.com/pulse/big-data-promised-land-where-smart-new-doh-martyn-jones?trk=prof-post

Absolutely Fabulous Big Data Roleshttps://www.linkedin.com/pulse/absolutely-fabulous-big-data-roles-martyn-jones?trk=prof-post

Not banking on Big Data?https://www.linkedin.com/pulse/banking-big-data-martyn-jones?trk=prof-post

10 amazing reasons to join The Big Data Contrarianshttps://www.linkedin.com/pulse/10-amazing-reasons-join-big-data-contrarians-martyn-jones?trk=prof-post

Amazing Data Warehousing with Hadoop and Big Datahttps://www.linkedin.com/pulse/cloudera-kimball-dw-building-disinformation-factory-martyn-jones?trk=prof-post

The Big Data Contrarians: The Agora for Big Data dialoguehttps://www.linkedin.com/pulse/big-data-contrarians-agora-dialogue-martyn-jones?trk=mp-reader-card

The Big Data Shell Gamehttps://www.linkedin.com/pulse/big-data-shell-game-martyn-jones?trk=mp-reader-card

Aligning Data Warehousing and Big Datahttps://www.linkedin.com/pulse/aligning-data-warehousing-big-martyn-jones?trk=mp-reader-card

Big Data Ludditeshttps://www.linkedin.com/pulse/big-data-luddites-martyn-jones?trk=mp-reader-card

Data Warehousing Explained to Big Data Friendshttps://www.linkedin.com/pulse/data-warehousing-explained-big-friends-martyn-jones?trk=mp-reader-card

Big Data, a promised land where the Big Bucks grow https://www.linkedin.com/pulse/big-data-promised-land-where-bucks-grow-martyn-jones-6023459994031177728?trk=mp-reader-card

The Big Data Contrarianshttps://www.linkedin.com/pulse/big-data-contrarians-martyn-jones?trk=mp-reader-card

Is big data really for you? Things to consider before diving inhttps://www.linkedin.com/pulse/big-data-really-you-things-consider-before-diving-martyn-jones?trk=mp-reader-card

Big Data Explained to My Grandchildrenhttps://www.linkedin.com/pulse/big-data-explained-my-grandchildren-martyn-jones?trk=mp-reader-card

If you enjoy this piece or find it useful then please consider joining The Big Data Contrarians:

Join The Big Data Contrarians here: https://www.linkedin.com/grp/home?gid=8338976

Many thanks.

[1] is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information.

**Now sadly departed. https://en.wikipedia.org/wiki/Zaha_Hadid