XML Databases Offer Greater Search Capabilities

Extensible Markup Language is emerging not only as a Web page markup standard, but as a database technology with the potential to simplify and speed future Web operations.

Extensible Markup Language is emerging not only as a Web page markup standard, but as a database technology with the potential to simplify and speed future Web operations.

With databases that store whole documents in their native XML format, an archive becomes easier to search by title, author, keywords or other attributes. The development will broaden information that is available over the Web and make speedy content serving more practical, database experts said.

The World Wide Web Consortium (W3C) last week released its XML Schema specification, which defines how to use XML — a larger and more useful tagging language than its predecessor, HTML. At the same time, pioneering efforts to implement XML in database systems for managing XML documents are gaining steam.

Software AG leads the field with its Tamino XML Database, and 9-month-old start-up Ipedo announced its own XML Database System last week. In the meantime, relational database vendors IBM, Oracle and Sybase continue to upgrade their products to give them more XML-handling capabilities.

"I see a lot more people are treating XML not just as a document format, but as a data format," said Tim Matthews, president of Ipedo. An XML database "gives you a nice performance advantage over traditional databases," he added. Ipedo, based in Redwood City, Calif., received $7 million in funding from venture capital firm Draper Fisher Jurvetson on March 19.

Both Ipedo and Software AG implement their own versions of the W3Cs proposed specification for the XML Query language, now known as X Query for short. The X Query draft specification was released Feb. 16. Once it becomes a released specification, the use of XML documents and XML databases will proliferate, experts predicted.

Ipedo is trying to capitalize on speed by urging its customers to equip their database servers with a gigabyte or more of memory. The Ipedo XML Database System dispenses with many of the time-consuming input/output operations of traditional databases by having the database engine and much of the data it works with reside in main memory. The move adds $1,500 or more to the cost of the server on which the database resides, but augments the speed already inherent in serving XML documents from an XML database, Matthews said.

Software AG of Darmstadt, Germany has sold 300 copies of its mainframe-style Tamino product since the system was launched in 1999. "Content delivery is one of our greatest strengths," said John Taylor, Software AGs director of product marketing for Tamino. He conceded that customers wouldnt buy an XML database primarily to manage large financial accounts.

On the other hand, Taylor added, emerging query languages such as X Query, which was co-authored by IBM and Software AG, will make it possible to query the XML database using "keys" and retrieve related information from a variety of documents. Just as Structured Query Language queries the relational database, pulling out data related to a primary key or identifier, X Query will be able to query a large set of documents based on the name of an author, date filed, subject or keywords in the document, Taylor said.

Californias State Board of Equalization, which oversees the states property tax and sales tax collection, has implemented a Tamino system so sales tax payers may submit their payments electronically, filing related tax documents at the same time.

"We decided to use an XML database because it is the interchange format of the future," said Larry Hanson, data architect at the board. He said he chose Tamino because it had the stability and recovery features he wanted, much like a mainframe relational system.

Sales tax payers around California can go to a third-party access provider, such as Nationtax Online, fill out an XML form with added notes and comments, and submit their payment electronically with a validated electronic identification, Hanson said. While Hanson didnt want a flood of users during the shakedown period, he predicted as many as 700,000 sales tax payers may eventually use the system because it eliminates paperwork.

A prospective Ipedo XML Database System user, John Case, senior developer at iMedium, said his firm is an application service provider in the sales force automation and marketing area. He plans to implement the Ipedo XML Database System because he wants to be able to retrieve all documents related to a given customer after they have been captured in XML format. For example, he wants his customers to be able to refer back to specific presentations after hundreds have been recorded.