XML Comes Home to the Database

By Lisa Vaas  |  Posted 2005-12-15

XML Comes Home to the Database

Editors Note: This story is part of , a continuing series of stories from the reporters and editors of Ziff Davis Internet. Instead of the usual mile-high look at the year ahead, these articles examine particular technologies and markets in transition, including whats in store for them.

Used to be, friends didnt let friends use stored procedures and triggers. Application logic should happen in the application tier, the wisdom went, and databases were just those boring, clunky old things used to crunch numbers and process basic text.

Well, all thats changing, because this is the year when XML comes home.

Home, as in, to its rightful home—roosting right where database administrators want it: in the relational database.

"Database people have been waiting for this to happen for a decade or two," said Peter OKelly, an analyst with Burton Group. "Theyve been waiting for everybody to realize that everything should be stored in the DBMS [database management storage system]."

Big database vendors are just at the start of a new stage of innovation and competition with regards to wrangling both structured, i.e., relational, data, and unstructured data, as represented by XML, in the same box.

Microsoft Corp. arguably beat IBM out the door with support for XQuery and a native XML data type in SQL Server 2005, which became available in November, while Oracle Corp. boasts of being the first to support XQuery.

But IBM is distinct in packing a revolutionary concept with its dual-storage approach to handling XML, OKelly said. This approach is now at the open beta stage in IBMs next version of DB2, code-named Viper.

All the big relational database vendors are shying away from traditional means of handling XML in the relational database. Up until very recently, non-native ways of handling data have included shredding or parsing, wherein data is assigned to a particular tag in a column in a relational table, and putting "blobs" or "CLOBS" of data into relational fields.

But shredding XML means you lose the fidelity, or the hierarchy, of the XML document itself. Blobs retain XML fidelity, but they prevent you from searching on data thats put into fields.

Click here to read an eWEEK Labs review of Microsoft SQL Server 2005.

The most serious problem with these approaches is performance, according to Thore Thomassen, senior enterprise architect for Storebrand, a major supplier of insurance and financial services based in Oslo, Norway. Storebrand is an early user of Viper.

"Because most relational databases are normalized, the shredding gets very complex, and some of the most advanced SQL we have in Storebrand represents mappings between XML and a relational model," he wrote in an e-mail interview with Ziff Davis Internet News. "In some cases this mapping is not possible or the performance is too bad for it be used (we had one query that took minimum 10 minutes to complete, 30 max)."

In such cases, Storbrand stores XML as a CLOB. The problem there is you can neither access nor query the XML content.

The other big problem with shredding, Thomassen wrote, is that Storebrand loses most of the flexibility offered by XML.

"The result of trying to solve this problem is complex, flexible relational models, which are very hard to maintain," he wrote. "Examples of this are tables with 70 columns where only 10 are used or complex metadatabases where the data model is defined on the fly."

Bernie Spang, director of Database Servers for IBM, has said that Viper gives the best of both worlds, as its XML-handling capabilities allow users to retain the hierarchical, searchable form of XML. Vipers native XML technology provides support for XQuery, an emerging standard language that extends XPath and is specially designed for processing XML data.

With Viper, applications can use XQuery, standard SQL or both to retrieve documents from either or both underlying storage formats.

SQL Server 2005 keeps XML and relational data in one storage engine. Microsoft has set it up so you can insert SQL queries into XQueries and vice versa—a technique that one SQL Server 2005 user said gains more power for the developer.

"… It allows the developer to cross over the means of programming (XQuery and TSQL) and also helps the engine, using indexing and query optimization over the whole data," said Gerald Schinagl, diplomized engineer and project manager for the Sports Database at Austrian Broadcasting Corp. Radio & Television, in an e-mail exchange.

Next Page: Upgrading for XML support.

Upgrading for XML Support

ORF needed to upgrade its 2GB relational database of sporting stats to get support for XML and to better integrate with mobile devices as it delivered entertainment such as its extensive coverage of Austrian winter sports.

ORF is in the process of moving off of SQL Server 2000 and onto SQL Server 2005 running on Windows Server 2003 Standard Edition.

The broadcaster is already finding 70 percent faster responses with XML data types and has found it up to 90 percent faster to create XML-based applications.

Of course, moving off of one flavor of SQL Server and onto the next is something of a no-brainer.

But Schinagl has reasons for preferring SQL Servers approach to XML over that of either IBM or Oracle: "[Microsoft] gives you the best of both worlds … if an XQuery is underperforming, you still have the option of reverting to relational techniques"—what Schinagl called "relationalizing" part of the data back into property tables.

Read details here about virtualization in IBMs DB2 "Viper."

"The IBM way seems to me like a mixed approach (two different engines), more the type of an interim release (seems like they are looking [to see] if it makes sense to use one engine)," he wrote.

Thomassen said he finds Vipers dual-engine approach to be well integrated. "IBMs solution is, in our experience, very robust and for any practical use it looks like one single engine for a user," he wrote.

As far as Oracles XML handling goes, Schinagl is one of many who dismiss the database giants XML handling as amounting to—heres the dirty word again—shredding. "Oracle does not have … really, XML handling," Schinagl wrote. "Its a type of shredding the content."

Not fair, said Willie Hardy, Oracles vice president of database product marketing, adding that Oracle 10g R2 introduced true support for XQuery and XTable, and Oracle has been actively engaged in meeting customers XML handling needs with a choice of storage options that are tuned to serve individual customers.

"The Were more native than you are is a way to create a differentiator thats irrelevant," said Mark Drake, an Oracle product manager. "Its not one is better than another, one is more native than the other. Each one solves different problems."

Oracles XML-handling abilities in 10g have their fans. Oracle published a case study about Starwood Hotels and Resorts Worldwide, for example, that pointed to XQuery being a key reason for the hotel chain to upgrade to 10g R2.

"XQuery support will be great for our content management system, since the content is stored in XML format," Arup Nanda, Starwoods director of database engineering and operations, is quoted as saying in the case study. "The flexibility simply increases tremendously for the developer community."

Native XQuery support does indeed mean that developers can query XML, relational, object-relational and repository data using this industry-standard XML querying language.

For Don Chamberlain, an IBM fellow of query languages, all this boils down to an exciting time to be in the database business. "For many years weve had relational systems very good at what they do," he said. "Theyre highly optimized for structured business data. … They do it very well and theyre not going away."

But theres a lot of data in the world that doesnt fit into that paradigm, Chamberlain said. Theres a need for self-describing data, and XML fits that need.

Thus we find ourselves headed into 2006 poised on the edge of some important new languages coming along. Similar to the way SQL served as a focal point for the relational database industry years ago, the new language of XQuery may well serve as the next focal point.

And squarely at the center of that focal point is the relational database. The thing to watch out for now, Chamberlain said, is how quickly the marketplace soaks up the new technology and what kinds of applications it spawns.

Users, for their part, need more XML advances from the database vendors. Thomassen, for example, said a big issue is the need to control data quality.

"In todays version you have the ability to do schema validation at insertion point, but it is [not] forcing the XML to be validated," he wrote. "It is also no validation of the XML once it is stored in the database. It is a big dilemma because we want the flexibility and ability to store anything but at the same time we want to have a database with quality data."

Check out eWEEK.coms for the latest database news, reviews and analysis.

Rocket Fuel