When I saw an eWEEK story late last month on new business continuity and disaster recovery services, it made me think about the problem of persistent objects. Let me explain that leap.
Suppose that all of the people in your firm suddenly vanished: no damage to your offices or factories, no loss of critical data, but nothing left that had been only in peoples heads when they disappeared. “To do” items, knowledge of work in progress, thoughts about resource availability for pursuit of present and future opportunities: all gone.
For at least a decade, weve been urging people to move beyond the Stone Age IT model of data in one place and behaviors (that is, code) in another. There are all sorts of things that are easy to do wrong when any piece of code can get at any piece of data: Unless every single piece of code knows about every single data representation, inconsistencies are easy to create and difficult to detect (let alone to repair).
Objects are at least a Bronze Age improvement: Data carefully enclosed in the shell of the only code thats allowed to directly change it. As we move toward distributed applications with shared object populations, inconsistency remains a problem, but ideally our applications will live in a world model that represents every kind of entity that they need to manipulate or understand. The problem of managing a multi-user object-based environment is, however, far from simple.
What happens when our goal of object representation in our IT world has to coexist with our goal of defense against the things that can go wrong in the real world? Yes, the core of an object instance may be a set of data values, and those can be backed up—but the objects shell of code may have its own state as well. What does it mean to have a backup object stored at a remote hot site?
Im reminded of the essential problem of Star Trek transporter devices: If you cant tell me the difference between a live human being and a still-warm corpse, how do you preserve that difference while youre re-creating the body at another location?
The transporter problem is still some years away; today, though, we can serialize objects into byte streams, putting them into a form in which they can be transmitted and stored and reconstructed at some future time. We have to be constructively paranoid about this, and again I wind up thinking of science fiction scenarios such as the problem of someone waking up my backup personality and putting it into someone elses body: Do I want the copy to know everything about me? A naïvely serialized object may contain information, such as password values, that I dont want roaming around and which I therefore need to exclude from the serialized byte stream (with a facility such as the “transient” keyword in Java).
Its common for object-based systems to create many temporary objects, a resource and performance issue in the base case—but a real complication if we start storing objects in lots of places, since now we need to do garbage collection across those persisted copies. If were not careful, well wind up using far too much of our storage for garbage—or far too much of our bandwidth for messages asking, in effect, “Are you finished with that?”
XML, with its inherently serial character and its ease of identifying and manipulating individual elements of a collection, is a natural candidate for persistent object storage, but making this choice opens the door to many next-level choices. Should XML files be stored in native file systems, or in a generic database management system, or in an XML-specific data management environment?
These are fun questions to ask, since they give us an excuse to evaluate new technology, but Cameron Laird makes an important point about the need to avoid letting the choice of a tool change the definition of the problem: “You start to wonder, What does it take to maximize XML performance? The answer: You dont want to maximize your XML performance. You need to meet engineering requirements.”
The purpose of all this effort is to build applications quickly, with high return on investment, based on choices that maximize our future ability to select from the best of the next-generation technologies. Lets persist, if youll pardon the expression, in staying focused on that goal.
Tell me if persistent objects are old news, new news, or too-news.