Tuple Data Model Faces Real World

Distributed-systems model takes on IT roles

A tuple is neither an exotic fungus nor an adults-only entertainment. Defined with misleading simplicity as "a series of typed values," the tuple can be to distributed computing what a base pair is to a molecule of DNA: Tuples carry information and provide their own form of organization—in a manner that may seem inefficient—but they enable adaptation to situations not foreseeable when a system was conceived.

A simple tuple might be something like "John Doe, 2/25/1980, 123-45-6789": a series comprising a character string, a date and a Social Security number. Pattern-matching algorithms readily match such "tuple signatures," with or without mechanisms for "dont care" values or for binding pieces of tuples to the variables in a computer program.

Tuples, combined with pattern-based retrieval, give IT systems the same flexibility as a large unsorted pile of papers—exploiting, to some degree, the decline of hardware costs compared with human costs during the first several decades of IT development. A costly data administrator or library scientist can try to anticipate the exact criteria for retrieving information at some future time, creating an administrative burden of classification and sorting; alternatively, a less expensive clerk can simply put everything in one pile and search, in response to ad hoc requests, for "bills due this week" or "bills with past-due amounts" or even "bills with green logos at the top." This is content-based retrieval, as opposed to the more rigid order- or address-based retrieval schemes traditionally imposed on IT architects.

Depending on the application, a process might generate a tuple for inspection by other processes (as when publishing a request for data or service); might inspect, without altering, a tuple from another process (as when accessing shared data); might locate, read and destroy a tuple (as when granting a request that will no longer need attention from any other process); or might search for a tuple with particular features, waiting until such an entry becomes available before proceeding (as when offering a service and awaiting requests from clients).

If one process produces tuples with actual data values, while another process uses a template that will match any tuple with that signature of value types, then these processes can engage in procedure-call interaction without regard to where (or when) each process is running.

Tuple spaces (network-addressable repositories of tuples, shared by cooperating processes) thus provide a framework for distributed computing in environments, such as mobile and wireless networks, that dont fit the crucial assumptions (such as fast, persistent, synchronous links) that are integral to traditional IT models.

Crucially, the tuple-space model conceals underlying data representation and database architecture decisions from the applications that use the repository. An application might initially be supported by a simple data model that does not scale well with size or might use a single server that creates a single point of failure, but these initial choices could be replaced by more robust technology without affecting the applications flow of operation.

However, flexibility invariably comes at a cost, and the tuple-space model should not be perceived as a practical alternative for conventional database tasks. Relational databases, for example, effectively encode design-time knowledge about the nature and meaning of records and tables to enable powerful and general query operations. Tuple spaces dont attempt to match this strength; what they provide is persistent storage of data with unpredictable structure and a relatively short useful lifetime.

Object databases offer their own distinctive advantages in representing objects with a complex structure, potentially incorporating hierarchical relationships (containers of multiple objects, some of which might themselves be containers) with fewer complex queries than might be needed by relational systems.

However, a tuple space, although likewise based on object-oriented matching of data types and encapsulation of behaviors as well as data, will not typically be designed for such complex relationships and will not replace an object database as a transparent, persistent extension of an applications transient object pool.

Sun Microsystems Inc.s JavaSpaces (at java.sun.com/products/ javaspaces) and IBMs TSpaces (at www.almaden.ibm.com/cs/TSpaces) are IT-oriented implementations of tuple-space communication, offering enterprise developers new freedom to explore distributed technologies without first devising cumbersome (and possibly dead-end) infrastructures.

TSpaces technology is already available for license from IBM and is featured in the companys mobile technology demonstrator—a fully wired Ford Explorer, dubbed the alphaWorks TechMobile, which debuted at IBMs Solutions conference in San Francisco earlier this month.

A moving vehicle presents a constantly changing resource environment, with varying quality of network service and with vehicle occupants entering and leaving the zone of its wireless connections. Tuple spaces enable dynamic matching of user needs with local vehicle systems and remote network services.

TSpaces serves as the TechMobiles "soft backbone" middleware, integrating voice recognition, Bluetooth wireless links and even eye-motion tracking input for vehicle control.

IBMs Enterprise TSpaces, scheduled to appear next month, will extend TSpaces with data replication, automatic state recovery and dynamic partitioning of tuple spaces to move this technology up the food chain from small-scale networks and pilot projects to larger production systems.