Schema evolution



Jon and I had a pretty heated discussion today over serialized objects. The background is as follows:

Using Prevayler, you will have a file on your disk with your serialized object graph. This stores the state of your application, but, very important, also the implementation of that state. When deserializing the object graph, you do not retrieve properties that are set on “new” objects, you are retrieving the object itself. Like, duh.

This introduces a problem when changing the object schema. If you update your application and the new runtime has a different set of classes, you’re screwed. Sure, changing the name of a property or class can easily be done, but what if relations are changed?

Now to my thoughts: The problem is that the serialized object holds too much info in the upgrade scenario. We don’t want to know the implementation then, we only want the state. We thus need to transfer our snapshot from a serialized form to a form with less info. If we can do this we get away from what Joshua Bloch in “Effective Java” describes as making our private code be a part of the API. Using this, we can insert this data into a upgrade program and setup the new object graph. This is not ideal but it works. (Agree so far Jon? :-)

One good thing about a relational database is that it stores a bare minimum of information regarding system state. This means that the dynamic implementation, your runtime, can change and the static, stored, state is changed with a minimal effort. This is truly Good.

But that is not really what is good with a relational database. What is good is the tabular data model. It could as well be described in another format such as any homegrown xml-format. So what is good about relational databases?

I can’t believe I’m writing this, but the good part about relational databases is, argh!, SQL. It is the standard way of retrieving information that is the catch. It is Crystal Reports and Excel. People want this. Not me. Other people. I wish we could just do a XML-serialization, which is all we need for updates, but I know customers will want the SQL support so that they can use their tools. Anyone know of any XML tools with SQL support? (Probably very common, I haven’t looked yet)

A long time ago somebody suggested that Java should provide a versioning system, so that a class is not only defined by its name but also by it’s version. I suddenly realize that this is new no 1 of my “Things I want included in Java” list. This would solve the schema evolution problem since every new version of a class could have a constructor that took the old version of the class as an argument.

As I see it, persistence is needed in two different ways.

  1. In the running application where the data is strongly coupled with the code and performance is very important.
  2. Snapshots for data analysis and updates. This data needs to be “flexible” but performance is generally not an issue

To me, RDBMS solves number 2 very well. I used to think it was half decent at no 1 as well.

Ok, to sum it up, I want to be able to convert the Prevayler snapshot to an XML equivalent. I want that to be readable through SQL. I want a Java to handle versions. I want World Peace and free ice cream.