Saturday, November 20, 2010


The current technological landscape is obsessed with data. Open data, data API's, walled gardens, data silos, data stores, the semantic web - the list is endless. Gov 2.0 is all about getting access to government data: the US has, the UK assigned Tim Berners-Lee to kick off, and similar efforts are underway elsewhere. Data, data, data. Indeed, Tim O'Reilly says the internet OS is a data OS. In reality, all operating systems are data operating systems, and the internet OS is no different.

Data or Process?
So what's wrong? Well, we seem to be confused: we seem to be separating data from the processes which operate on that data. Open data and open software are separate topics right now. Inert data as the next internet frontier is being heralded as a profound observation, and that's a mistake. Sometimes boiling things down so that they're simple and concise shows a superior grasp of both the subject matter and the communication medium. Sometimes it just means you've missed something important.

The processes which operate on data are data. If you look at the bits and bytes of your hard drive, it is impossible to distinguish between Photoshop the application and the Photoshop files. They're just data. Look at it another way - when a developer saves a code file, the code is data to the development environment. And the code for the development environment is data to whatever was used to develop it. Even more philosophically - which came first - data or process? A simple demonstration of how much easier this makes things: transparency in government - we don't just want census 'data' to be made available, we also want the process of census taking to be open. In fact, the latter has significantly greater implications for our ability to participate in the government machine.

From this perspective, the internet OS is just like any other - a magical structure that bootstraps itself from a singularity and delivers a universe of complexity and beauty. How we managed to use the term operating system and forget process is a mystery. We can observe the damage that is caused quite plainly - what would an OS that didn't appreciate process look like? All the applications would be completely different, they would each require separate logins, have different controls, non-standard interfaces, install differently, fail differently, report differently, vary significantly in quality, fail to integrate in most cases, or in ad-hoc manner in a few - we'd have silo's and lack of transparency, lack of trust, poor resource usage, lock-in... what a nightmare! Oh wait... that's the internet - an OS that's way too focussed on a concept of inert data. We are starting to see the open data discussion extend to things like - 'who should maintain this data?', 'how should this data be analysed?', 'what means were used to collect this data?' - Oops! Did we forget something? Time to apply our understanding of how an OS really works. Time to reboot with a new kernel version that better understands process.

No comments:

Post a Comment