Friday, August 25, 2006

Applications of a Canonical Data Model

Little more than a year ago, NS (Dutch Railway Company) started initiatives for the implementation of a better, up-to-date system integration platform. In the beginning we were mainly exploring the realm of Enterprise Application Integration (EAI). Because IT -flexibility was one of the main goals back then (and this one still is!), we were also aiming at the use of a canonical data model (CDM) in order to loosely couple our applications on the data aspect. When time passed and system integration articles heaped up on our desks (and throats got soar!), we became able to soak off the idea of a CDM from its original use in a loose coupling mechanism. This opened doors to new worlds. New ways to apply a CDM popped up, a better understanding of its nature arose. This second posting gives a brief overview of some of our results…

Applications of a CDM
The big insight was, at least to me, that a CDM is a data model, and although it’s a rather special type of data model, it’s NOT limited to a specific use. In our early enthusiastic days of systems integration, we learned from all sorts of literature, whether in books or on the Internet, that a CDM was a data model to be used as an intermediate data standard to loosely couple applications. A picture like the one below is probably well known to you (if you’re an integration-idiot like we are!). There is IMHO a lot more to gain from using a CDM in your organization! And, BTW, would you like to create and maintain data models specialized for each type of application you’re interested in? Think about systems integration, business intelligence, application development and the like… Probably not! And what to say about trying to maintain these multiple, overlapping models in concordance! Good Heavens no!











Picture: Vision of a CDM*, taken from Enterprise Service Bus by David Chappell (great book!), ISBN 0-596-00675-6 by O’Reilly
* In this picture, the CDM can be found in what is called the Canonical Message Format, because, as far as my knowledge goes, most companies using a message format this way do the data modeling for this message format in the message format itself, mostly within some XSD- Schema document, and not in a separate data model like we do. So, in those cases, this canonical message format plays the role of a CDM as well as the role of a canonical message format proper (Canonical Message Model or CMM in our terminology).

It is my conviction that you should develop and maintain your CDM to support a large number of possible applications. It will be hard enough to create only ONE enterprise wide model!

When you work this one out a little bit further, you will probably find the requirements for these applications to be very much alike. It’s ‘just’ proper data modeling that forms the basis, and a few types of metadata will have to be added for specific use. I will publish more on this later. So, here’s my list:

If set up appropriately, a CDM can be used as
1. an intermediate data standard to achieve loose coupling on the data aspect
2. a commonly accepted business language for improving any kind of communication process
3. a data catalogue that supports data sharing
4. a data model catalogue to improve reuse of data models
5. a thermometer in your IT- environment to find weaknesses and opportunities to improve your IT
6. a tracing tool for making impact analyses
7. a link between the data aspect and the organizational aspect of your organization to help set up registrations for authorization, data ownership, data maintenance and so on

I plan to further discuss these applications with you in the near future. However some of them may sound a little exotic, the first three particularly are already of high interest to us right now.

Thursday, August 24, 2006

Definition of Canonical Data Model

As a way of introduction, this first posting will give you a rough insight into my vision of what a canonical data model is (or a ‘CDM’ for short).

Definition of a CDM
As I see it, and like its name would suggest, a CDM is a data model. Hence, it should give a vision, in data modeling terms, on a specific domain of interest. It could, for example, list

· what kind of things are perceived relevant to the domain at hand (and how we call and define them)
· what sorts of information about these things are of interest to the domain
· how and where these information types are represented in our IT systems

In addition to this, a CDM might tell you, among other things
· who in your organization should be allowed to do what with these types (or who should not!)

The exact content of a CDM is of course strongly influenced by the applications you have in mind (see Applications of a CDM), but it seems to me that the above mentioned information is kind of basic.

However, not all data models that contain all of this information about a domain are a CDM. The addition of the term ‘canonical’ indicates that this model has a special status within an organization, or at least, it is expected to have such status. Whether it actually has is quite something else ;-)

This special status holds that this model, or more specifically its content, is to a large degree accepted as a common data standard within the organization. There is however a lot of pitfalls to this simple statement, enough for its own series of blog postings and plenty of discussion.

For now it suits me to state that a CDM is a data model that is intended to be as commonly accepted as possible at any moment in time, within the limits of an accepted level of ambition and the assets available (for instance: time).

This simply means that your canonical model will not only grow in size, as time progresses, but also in quality and ‘acceptability as a common standard’. It seems very unrealistic to me to expect any model to be commonly accepted from its conception, not even a small model.

Lastly, IMHO, a CDM is intended to eventually cover the complete organization, and to become an enterprise wide data model.