Page 2

By Lisa Vaas  |  Posted 2006-06-08 Email Print this article Print

Indeed, Picciano said, large ISVs (independent software vendors) such as Nextance and Justsystems have shifted to using XML as an internal representation format over the past year or two. "As weve introduced Viper to them, theyve said, This is exactly what we were hoping somebody would step up and do," he said. If peoples interest in XML has flagged, its not because XML isnt out there; rather, its the fault of inelegant XML handling databases, Picciano said. "Because todays generation of XML handling databases has been woefully inefficient in handling XML data, many customers have kept it separate," he said. "Its been spread out across disk systems, not centrally managed as a data asset. If you talk to some people in IT, theyre unaware of the amount of XML their organizations are using. … But I really have yet to go to a client and talk to them about XML as a data technology and have them say, We dont have any need for that."
Vipers XML power has gotten all the press. But two other biggies catch the eye of Bloor Researchs Howard: new compression technologies, brought with Vipers "Venom" technology, and the fact that Viper is IBMs stake in the ground in the data warehousing space.
He pointed to IBMs Data Warehousing BCU (Balanced Configuration Unit) as being the first positive move to be taken by a data warehousing company in the face of appliance vendors starting to muscle in. "Appliance vendors are starting to get a lot of traction," he said. "Its hurting Teradata, [and] it will hurt Oracle. IBM is the only mainstream vendor taking steps to compete with the likes of DATAllegro … and Netezza." The appliance companies offer bundled data warehouse appliances that practically eschew administrative costs, Howard said, delivered as preinstalled software on hardware platforms. To deliver the BCU, IBM took its experience with data warehousing and created a set of best practices, multidimensional clustering, summary tables and more to preinstall DB2 in a data warehousing environment.
Its a turnkey solution, Howard said, and it could help IBM grab market share from its arch competitor, Oracle. "[IBM is trying] to minimize the management overhead," he said. "I dont think that goes the whole way to answer the threat of Netezza, but Oracle hasnt moved at all to compete with [that threat]. IBM is bundling BCU with a hardware platform, [whereas] Oracle doesnt have a hardware platform. This is potentially a threat to Oracle." Independent analyst and eWEEK contributing columnist Charlie Garry disagreed with Howard, however, saying that the BCU is, in fact, another instance of IBM playing catch-up. "In the past, IBM simply had not standardized on a subset of hardware and storage for their warehouse implementations with DB2 on Unix/Win/Linux," Garry wrote in an e-mail exchange. "This meant a great deal of configuration on-site and delayed successful implementations of DB2 as a warehouse database. The BCU is simply a way for IBM to sell a bundled set of hardware and software that they have great experience with and can more accurately predict performance for across a range of workloads. This helps to speed up implementations. IBM is not, in fact, the first vendor to come up with such a solution, Garry said—Teradata got there first, and IBM is, wisely enough, following suit. "Teradata has always operated in this fashion," Garry wrote. "It should be pointed out that Teradata competitors spread FUD about this approach, claiming the proprietary nature of Teradata systems. No one could argue with the fast time to value, however, and the share Teradata has taken over the past five years. Now IBM is doing it and it is a good approach, but it is not in any way an answer to the warehouse appliance vendors. We are not talking about a single piece of hardware containing server, storage, and software that you create tables on and load data. To put it more succinctly, a warehouse appliance could be up and running in a matter of hours after delivery while an IBM BCU or a Teradata system would take much longer. The point is that the BCU is not an answer to the appliance vendors. "Is certainly not a turn-key solution," Garry wrote. "But then again, no data warehouse is, appliance or not." The separate XML storage engine is an interesting approach, similar to the modular approach that MySQL has taken with its storage engines. While IBM will attempt to convince customers they need this, it is more likely that this technology is in DB2 to support IBMs own content management and data integration strategy. I see DB2 becoming an increasingly embedded solution for IBM in the future versus a stand-alone database offering. Many of the features are old relative to most other databases as pointed out in the article. Multidimensional clustering was known as a clustering index back in the day when I supported DB2 on the mainframe. Range partitioning has been in the mainframe version for perhaps 15 years. Now IBM trumpets the combination of those technologies with the hash partitioning they already sold as the data partitioning offering for DB2 on Unix/Win/Linux. These things have improved performance and the reason we know this is because other databases (even DB2 on zSeries) have used them before. The key to success will be in the Design Advisor which helps administrators make physical design decisions after the fact. If those recommendations are good, if they can be implemented without creating a great deal of effort and added expense, then IBM will have something. Beyond BCU, Viper packs loads of features that are warehouse-friendly, Picciano said. The list includes improved large database management and table partitioning. Table partitioning is a data organization scheme in which table data is divided across multiple storage objects called table partitions or ranges according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space or a combination of both. The benefits of table partitioning include the ability to create very large tables. A partitioned table can contain vastly more data than an ordinary table. By dividing table data across multiple storage objects, users can significantly increase the size of a table. Other warehouse-friendly capabilities include more-flexible administration capabilities. Users can now perform administrative tasks on individual data partitions, breaking down time-consuming maintenance operations into a series of smaller operations. Viper also comes with more granular control of index placement. Indexes can be placed in different table spaces and managed individually. In addition, Viper brings fast, easy roll-in or roll-out of data. This ability can be particularly useful in a data warehouse environment where you often move data in and out to run decision-support queries, Picciano said. Meanwhile, Viper comes with improved query performance. Separating data with table partitioning allows users to improve query processing performance by avoiding scans of irrelevant data. As far as Venom compression technology goes, Bloor Researchs Howard described IBMs approach as tokenization. The software looks for patterns that occur in the data. So if youre looking at a customer record and you see Michigan occur, you store a token that indicates the string "Michigan." The token is stored in a lookup table in the data dictionary, thus saving storage space. IBM claims between 30 and 70 percent storage savings. That depends on the application, of course, Howard pointed out, and on how much repetitive data youre talking about. Venom also raises one immediate issue, Howard said: Namely, if you have to compress and then decompress data to access it, there will be an overhead involved. Will that then lead to a performance hit? As it turns out, it doesnt, given Venoms reliance on use of in-buffer data storage, as opposed to disk storage, along with its compressed run-time. Theres less back-and-forth to the disk, which actually can result in slight performance improvements, Picciano said. "We have seen some modest performance gains, in the transactional and analytical spaces," he said. "Mostly [performance is] the same, with maybe a bit of advantage." As far as storage savings go, Picciano said IBM is seeing "tremendous results from customers and analysts," on the order of 55 percent direct disk savings. Next Page: The bottom line.

Lisa Vaas is News Editor/Operations for and also serves as editor of the Database topic center. Since 1995, she has also been a Webcast news show anchorperson and a reporter covering the IT industry. She has focused on customer relationship management technology, IT salaries and careers, effects of the H1-B visa on the technology workforce, wireless technology, security, and, most recently, databases and the technologies that touch upon them. Her articles have appeared in eWEEK's print edition, on, and in the startup IT magazine PC Connection. Prior to becoming a journalist, Vaas experienced an array of eye-opening careers, including driving a cab in Boston, photographing cranky babies in shopping malls, selling cameras, typography and computer training. She stopped a hair short of finishing an M.A. in English at the University of Massachusetts in Boston. She earned a B.S. in Communications from Emerson College. She runs two open-mic reading series in Boston and currently keeps bees in her home in Mashpee, Mass.

Submit a Comment

Loading Comments...

Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel