IBM has let “Viper” out of the bag, unleashing the next-generation DB2 9 data server with XML handling capabilities that Big Blue claims will turn our data handling ways on their head.
IBM on June 8 announced the ship date for the new data server, which is the culmination of five years of development. That effort had a lofty goal: to blow the dust off of static, traditional relational databases and transform them into beasts that can chew through all types of information—documents, audio and video files, images, Web pages, you name it.
At the heart of what IBM hopes will be a revolution in data handling sits Vipers XML capabilities. The data server includes patented “pureXML” technology that IBM Vice President of Data Servers Bob Picciano called “one of the big breakthroughs” to occur in the past 20 years.
“Nothing monumental has happened in the database space since we invented the relational database,” said Picciano, in Somers, N.Y. “When we think of the impact XML has on the information management marketplace, this is the most fundamental thing thats happened” since then, he said.
New XML-specific goodies include an XML data type that allows users to store well-formed XML documents in their hierarchical form within columns of a table. This in itself is a leap forward from relational databases inelegant XML handling to date. Relational databases have relied on shredding or parsing XML data and putting data assigned to a particular tag into a column in a relational table. Alternatively, traditional relational databases have put “blobs” of data into relational fields.
Both nonnative ways of handling XML are deficient. Shredding XML harms the fidelity of the XML document itself. For example, if XML data comes from a Web application that includes an electronic signature thats associated with part of a form, its contained in the XML hierarchy. But if you parse the XML content in rows across a relational table, that hierarchy is lost, and youre unable to pull that exact structure back out. Blobs retain XML fidelity, but you lose the ability to search on data thats put into fields.
Viper brings support for the XML data type in SQL statements and SQL/XML functions as well as support for the W3Cs new XQuery language. It also allows users to invoke the XQuery language directly, calling functions that extract XML data from DB2 tables and views.
In addition, Viper arrives with new tools such as XQuery builder to create queries against XML data; support for indexing over XML data, which improves the efficiency of queries issued against XML documents; and access and management of XML data by the DB2 data server.
Existing DB2 tools such as the Control Center, CLP (command line processor), db2look command and Visual Explain are enhanced to support XML data as well.
Viper also includes XML support in SQL and external procedures. Support for XML in many DB2-supported programming languages enables applications to combine XML and relational data access and storage.
What this all means is that users wont have to store XML separately from relational data. It will all be under one roof, allowing for tightened security, more efficient administration and management, and easier regulatory compliance for organizations that otherwise would have their data spread across the organization, Picciano said.
It remains to be seen whether the market is ready to jump on the blended XML/relational bandwagon. Phillip Howard, an analyst at Bloor Research, questioned how many people want to build serious business applications that use both XML and relational data. “The jurys out on that,” said Howard, in Bath, England. “My personal guess is that people will start to come up with all sorts of ways to do it.”
Indeed, Picciano said, large ISVs (independent software vendors) such as Nextance and Justsystems have shifted to using XML as an internal representation format over the past year or two. “As weve introduced Viper to them, theyve said, This is exactly what we were hoping somebody would step up and do,” he said.
If peoples interest in XML has flagged, its not because XML isnt out there; rather, its the fault of inelegant XML handling databases, Picciano said. “Because todays generation of XML handling databases has been woefully inefficient in handling XML data, many customers have kept it separate,” he said. “Its been spread out across disk systems, not centrally managed as a data asset. If you talk to some people in IT, theyre unaware of the amount of XML their organizations are using. … But I really have yet to go to a client and talk to them about XML as a data technology and have them say, We dont have any need for that.”
Vipers XML power has gotten all the press. But two other biggies catch the eye of Bloor Researchs Howard: new compression technologies, brought with Vipers “Venom” technology, and the fact that Viper is IBMs stake in the ground in the data warehousing space.
He pointed to IBMs Data Warehousing BCU (Balanced Configuration Unit) as being the first positive move to be taken by a data warehousing company in the face of appliance vendors starting to muscle in. “Appliance vendors are starting to get a lot of traction,” he said. “Its hurting Teradata, [and] it will hurt Oracle. IBM is the only mainstream vendor taking steps to compete with the likes of DATAllegro … and Netezza.”
The appliance companies offer bundled data warehouse appliances that practically eschew administrative costs, Howard said, delivered as preinstalled software on hardware platforms.
To deliver the BCU, IBM took its experience with data warehousing and created a set of best practices, multidimensional clustering, summary tables and more to preinstall DB2 in a data warehousing environment.
Its a turnkey solution, Howard said, and it could help IBM grab market share from its arch competitor, Oracle. “[IBM is trying] to minimize the management overhead,” he said. “I dont think that goes the whole way to answer the threat of Netezza, but Oracle hasnt moved at all to compete with [that threat]. IBM is bundling BCU with a hardware platform, [whereas] Oracle doesnt have a hardware platform. This is potentially a threat to Oracle.”
Independent analyst and eWEEK contributing columnist Charlie Garry disagreed with Howard, however, saying that the BCU is, in fact, another instance of IBM playing catch-up.
“In the past, IBM simply had not standardized on a subset of hardware and storage for their warehouse implementations with DB2 on Unix/Win/Linux,” Garry wrote in an e-mail exchange. “This meant a great deal of configuration on-site and delayed successful implementations of DB2 as a warehouse database. The BCU is simply a way for IBM to sell a bundled set of hardware and software that they have great experience with and can more accurately predict performance for across a range of workloads. This helps to speed up implementations.
IBM is not, in fact, the first vendor to come up with such a solution, Garry said—Teradata got there first, and IBM is, wisely enough, following suit. “Teradata has always operated in this fashion,” Garry wrote. “It should be pointed out that Teradata competitors spread FUD about this approach, claiming the proprietary nature of Teradata systems. No one could argue with the fast time to value, however, and the share Teradata has taken over the past five years. Now IBM is doing it and it is a good approach, but it is not in any way an answer to the warehouse appliance vendors. We are not talking about a single piece of hardware containing server, storage, and software that you create tables on and load data. To put it more succinctly, a warehouse appliance could be up and running in a matter of hours after delivery while an IBM BCU or a Teradata system would take much longer. The point is that the BCU is not an answer to the appliance vendors.
“Is certainly not a turn-key solution,” Garry wrote. “But then again, no data warehouse is, appliance or not.”
The separate XML storage engine is an interesting approach, similar to the modular approach that MySQL has taken with its storage engines. While IBM will attempt to convince customers they need this, it is more likely that this technology is in DB2 to support IBMs own content management and data integration strategy. I see DB2 becoming an increasingly embedded solution for IBM in the future versus a stand-alone database offering.
Many of the features are old relative to most other databases as pointed out in the article. Multidimensional clustering was known as a clustering index back in the day when I supported DB2 on the mainframe. Range partitioning has been in the mainframe version for perhaps 15 years. Now IBM trumpets the combination of those technologies with the hash partitioning they already sold as the data partitioning offering for DB2 on Unix/Win/Linux. These things have improved performance and the reason we know this is because other databases (even DB2 on zSeries) have used them before. The key to success will be in the Design Advisor which helps administrators make physical design decisions after the fact. If those recommendations are good, if they can be implemented without creating a great deal of effort and added expense, then IBM will have something.
Beyond BCU, Viper packs loads of features that are warehouse-friendly, Picciano said. The list includes improved large database management and table partitioning. Table partitioning is a data organization scheme in which table data is divided across multiple storage objects called table partitions or ranges according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space or a combination of both.
The benefits of table partitioning include the ability to create very large tables. A partitioned table can contain vastly more data than an ordinary table. By dividing table data across multiple storage objects, users can significantly increase the size of a table.
Other warehouse-friendly capabilities include more-flexible administration capabilities. Users can now perform administrative tasks on individual data partitions, breaking down time-consuming maintenance operations into a series of smaller operations.
Viper also comes with more granular control of index placement. Indexes can be placed in different table spaces and managed individually.
In addition, Viper brings fast, easy roll-in or roll-out of data. This ability can be particularly useful in a data warehouse environment where you often move data in and out to run decision-support queries, Picciano said.
Meanwhile, Viper comes with improved query performance. Separating data with table partitioning allows users to improve query processing performance by avoiding scans of irrelevant data.
As far as Venom compression technology goes, Bloor Researchs Howard described IBMs approach as tokenization. The software looks for patterns that occur in the data. So if youre looking at a customer record and you see Michigan occur, you store a token that indicates the string “Michigan.” The token is stored in a lookup table in the data dictionary, thus saving storage space.
IBM claims between 30 and 70 percent storage savings. That depends on the application, of course, Howard pointed out, and on how much repetitive data youre talking about.
Venom also raises one immediate issue, Howard said: Namely, if you have to compress and then decompress data to access it, there will be an overhead involved. Will that then lead to a performance hit?
As it turns out, it doesnt, given Venoms reliance on use of in-buffer data storage, as opposed to disk storage, along with its compressed run-time. Theres less back-and-forth to the disk, which actually can result in slight performance improvements, Picciano said.
“We have seen some modest performance gains, in the transactional and analytical spaces,” he said. “Mostly [performance is] the same, with maybe a bit of advantage.”
As far as storage savings go, Picciano said IBM is seeing “tremendous results from customers and analysts,” on the order of 55 percent direct disk savings.
Thats all good for the bottom line; Picciano pointed to industry estimates of some 70 percent of capital costs and depreciation that are being spent on systems as being eaten up currently by disk and storage investments.
Theres a lot of innovation in Viper. Viper spans more than 68 patents. That means that this one database server technology entails more patents than Oracle had for all its technologies in one recent year. IBM is throwing around numbers: More than 750 developers worked out of eight countries to create the database.
Innovation is a fine thing, but IBM is actually playing catch-up in a few areas with Viper. Range partitioning comes to mind. IBM is “years late” with it, Howard said. “Others have had it for a decade,” he said. “Its quite widely used. But what IBM has got in addition is it works with multidimensional clustering. So its much better than it was. Its now at least as good as what others are offering, and possibly better. It brings them up to speed in that area.”
Picciano agrees. IBM was late to the game with multidimensional clustering, but only because the company had alternative technologies that gave customers what they needed. Only when customers began asking for it did IBM deliver it, and the company delivered it in a rendition thats better than whats out there, he said.
IBMs also late to the table in bringing the ability to alter a table online, Howard said. “Previously, if you wanted to change a column name, you had to take the table offline, redefine it, delete the existing table and re-create the existing table. It was a real pain for developers. They were really behind on the count on that.”
Catching up aside, Viper is a threat to Oracle, Howard said, in many ways. Beyond its data warehousing goodies, another of the fangs Viper is baring at Oracle: SAP announced in May 2006 that Viper would be the preferred database for midmarket SAP applications. The move was made to close ranks ever tighter against Oracle, the two companies mutual enemy.
Another thing about Viper IBM is eager to get across, Picciano said, is that through these years of development, IBM hasnt lost focus or momentum when it comes to application developer communities. Picciano pointed to new features such as connectivity for Ruby on Rails applications, for example, as well as best-of-breed PHP support for DB2. New XML features also complement the PHP environment “very, very nicely,” Picciano said, giving people “all the flexibility they enjoy in the PHP environment.”
DB2 9 will begin shipping on July 28. Prices start at $4,874 per processor or $165 per user with a minimum of 25 users for DB2 9 Express. Click here for more information.
Editors Note: This story was updated to include input from Charlie Garry.