Friday, December 22, 2006

Bridging Time and Space needs Semantic Description

Recently I was occupying myself with the questions of why do we store our data, what kind of data storages do we have available these days (see Where do you live and what do you do? -Why, does it matter?), and what would be the consequences of doing this. I wasn’t only thinking of storage places like databases or more fluid memory structures. Messages would fit the idea just as well, or paper forms.

A teacher of mine once taught me that data registrations are meant to bridge time and space. Without them, our data would be vanished before we knew it. We could only pass our data between its users in a very immediate fashion, like throwing a hot potato that was not to fall on the ground.

I realized that our data need not be hold somewhere only if we can reliably deliver the data created by some process to the processes that need them, immediately after the moment they are created, and the receiving processes at their turn can use the data in their execution immediately thereafter. If we can’t do all of this, we certainly need data storages. Big thing, right?

This would be like calling a function or procedure in a computer program with a number of parameters. If the data in the parameters are of no use at all except for these direct function calls, then there is no need for storing them anywhere. We would never want to retrieve them then, would we?

In real situations outside the realms of computer program, this rarely occurs. Data are often needed for a number of different processes, some of which may not even be known at the time the data were created. Business Intelligence processes are an example of this. Can you foresee in advance what ad hoc analysis and reporting to expect for the next few years, or even months, or weeks? I certainly can’t!

Moreover, it’s common to have processes that can’t be executed immediately after the data it needs become available. Sometimes, a process needs data from multiple sources that do not create them simultaneously, for they don’t run at the same time.

So, we indeed need to be able to bridge time. We need to remember facts that are discovered in some process, in order to be able to feed them as data to one or more other processes at a later moment in time. We often do this with structured data, but it is no different with unstructured data. Think of any kind of writings like e-mails, internet pages, text documents, and so on. And text strings often should really be thought of as having unstructured content as well, shouldn’t they? And all of these things get stored for later use.

Secondly, not all processes can be executed on the same physical spot. Hence the need for being able to bridge space with our data stores. We need to transmit our data between physical places.

I believe most people are willing to agree with me that bridging space is a form of communication. But what to say about time? Imagine we only want to use our data later on in the very same process that created them, in the same physical place, so, only bridging time, not space. Couldn’t we say we were communicating to ourselves then? Communicating to our future selves, so to speak? Or is this a strange thought coming from a weird mind? Should we say, maybe, that this would not be ‘one process’ then?

My conclusion of all this is, that data registrations are always a matter of communication. Data registrations are used for communicating data between two communicators, likely bridging time and space, or between two different roles of one single communicator, at least bridging time, but maybe space as well (where will you be next week?).

Let me come to my point now. I guess you are aware of the dangers of a misinterpretation of data that are exchanged in any kind of communication. If you are, wouldn’t you agree that whenever time bridging or space bridging or both of them could be involved for a current, planned or not even anticipated application of a data storage place, some mechanism should be provided to support any possible data user, regardless of the applications she has in mind, in making the right interpretations?

Well, I know that different situations may not all require the same ambitious solution, but at least when different users or applications are identified, planned to be identified, or might be thought of to be identified, or when a user wants to use, plans to use or could think of using her data minimally, say, one week after creation, I recommend describing the semantics of the data in question formally, in a conceptual data model. This should become ‘good practice’ to anyone in the business. It shouldn’t be IT where this desire comes from. In most cases it still does, however, probably because for some reason some IT- people first gained the insights. Have you seen a book on the relevance of data semantics on the business bookshelves lately? Well, here’s a classic, a very good start, although not coming from the business bookshelf: Data and Reality, by William Kent, second edition (2000, ISBN 1-58500-970-9). It was originally written in the seventies of the 20th century, but it’s probably more relevant than ever before. Reading chapter 1, for instance, which I actually did twice, almost made my eyes tear. Not because of a romantic plot in the book, but because of the worrisome situation in Data Land described in this book, directing us to the work that lies ahead.


You may not be able to think of it right now, but sooner or later, someone will come up with an interesting new application for your data. Or you yourself will throw a hungry look at someone else’s data. We are only beginning to experience what data can do for us, whether it be our own or someone else’s. Marvellous world, isn’t it!?