As IBMs Dr. Patricia Selinger puts it, if the supply of database administrators were to increase in lockstep with the data thats flowing into database from the Internet, the world would now be populated entirely with DBAs.
Whether thats a desirable development is debatable, but for their part, relational DBMS (RDBMS) vendors are tackling the burgeoning size and quantity of databases by making new releases as self-managing, self-tuning and self-healing as possible.
IBM this week stepped up to the plate with “Stinger,” the code name for the next major release of its DB2 database. The beta release comes just in time for next weeks International DB2 Users Group (IDUG) conference in Orlando, Fla., where Selinger will keynote.
Selinger is an IBM fellow and vice president of data management architecture and technology for the IBM Software Group at the IBM Silicon Valley Lab, in San Jose, Calif. eWEEK.com Database Center Editor Lisa Vaas caught up with Selinger to talk about which aspects of the new release will help businesses deal with the ever-increasing data flow in terms of business intelligence, content management and information integration.
She also discussed what we can expect in future releases vis-à-vis XML capabilities and the new version of DB2 Information Integrator, code-named Masala, which will allow users to simultaneously retrieve information from databases, applications and the Web.
I hear that youre going to talk about business intelligence in your keynote. Whats up in that arena, and how is Stinger going to help businesses cope?
Volumes of data are increasing dramatically. People who thought that keeping three months of data online was good enough are now thinking they need 15 months. And the amount of transaction data is growing larger and larger.
People are … personalizing Web pages. They can no longer record a Web page, because it was composed on the fly. You have to capture whatever 15 pieces composed the Web page.
This dramatically increases the amount of data youre capturing and the amount you have to analyze. You could have had a warehouse with 10 processors, but now youre looking at needing 20 or 30 processors. So, scaling up is one direction things are inflating.
The second is the number of users. It used to be there were gates on the warehouse, and only certain people could look at the data. But with the ability to do almost real-time replication with Queue Replication in Stinger, we can now deliver data almost in published form. You can publish this data out to smaller data marts or individual peoples workstations.
Next page: Who has time to mess with BI?
Customers Indifferent to BI
Theres more data to analyze, but Stinger beta testers like Tim Kuchlein [director of information systems at Clarity Payment Solutions Inc.] say its too much of a hassle to deal with keeping database servers replicated and in sync with reporting servers. Is Stinger going to help with that?
Yes, I was just getting to that. The third piece is that for many reasons—complexity is one, also privacy—you may not want to move the data. You might have to go access it where its in place, at its source. Information Integrator, built on Stinger, provides the ability to reach out and access that data in place.
It adds to and extends your business intelligence. People will have a choice. I dont want to imply you dont have to put things together in a warehouse. You can continue to do that. But secondly, you have another choice, which is to access the data in place.
For business-intelligence users who want real-time, executive dashboard kinds of data, this is essential. For example: Id like to compare how many red shoes are selling right now, as opposed to at Thanksgiving.
What were doing with the technologies in Stinger and Information Integrator, which exploits Stinger technology, is [giving customers] the ability to access more data for more users and to extend that with real-time access to source data.
Speaking of source data, DB2s XML-handling capabilities have steadily been getting better, but youre still ranked slightly below Oracle Corp. databases in that regard. What can we look forward to vis-à-vis native XML capabilities in Stinger?
Thats the final piece: the ability to access unstructured data. Because business intelligence needs to work at not only warehouse, inventory, transactions, sales and whatnot; it also needs a complete view of the customer.
You need to look at not only what I bought but what customer complaints and e-mails Ive sent. Those would be in files and e-mails and so forth. You want to look at that and include that. XML is part of that, and so is Information Integrators ability to access unstructured data through Dominos Extended Search. That has access to Notes data [among other things].
Are we going to see native XML storage and indexing, or will we see more shredding—i.e., parsing of XML into relational tables, which slows performance?
The extenders we have out there today are the first wave of our XML functionality. What youll see next is a deep implementation of XML that goes far beyond what we have today and will outclass anybody in the field. Were in the process of building an in-depth implementation of native XML storage, indexing and so on. Were showing it now in a closed alpha form to customers. Technology previews will come this year.
What some other implementers that have relational systems have been doing, theyve been taking XML apart and putting it not in [the tree form in which native XML is stored—i.e., shredding]. What we have invented is a way to store things in a more native form in the tree structure itself, which gives a significant performance advantage. You have to have specialized indexes that index the branches of the tree, in essence. Data storage and indexing technologies have to go hand-in-hand.
Masala to Revolutionize Search
You mentioned that XML handling was just one method of handling unstructured data that IBMs now working on. What other methods are you looking at?
Text-searching technology is one well be introducing with the Masala release of Information Integrator, which comes later this year. It will search any character data within the database and combine that with other data sources, such as e-mail, the Web, intranet, Internet and so forth.
Youd like to target a number of data sources and then pull together the indexing that will allow you to provide one search answer across all those different data sources. And, in the fashion were used to with search engines on the Web, to provide relevance rankings.
Also in Information Integrator, one thing were spending a lot of time on is the tool set, to make this obviously complex universe of systems that are accessible through Information Integrator, to make programming through that very easy.
Automated discovery of servers with data on them, and automatic injection of metadata that describes data that resides there, so you can much more easily look at your world and program to it as if it were in a single place.
That brings us to autonomic computing, which is big in Stinger.
Right. One very major step forward that is unique to IBM is DB2 Design Advisor. You take a number of design problems, like choosing indexes, materialized query tables, choosing multidimensional clustering and choosing partitions. Each of those is a mathematically hard problem to search the capabilities. Designer takes those and combines them together.
Theres a number of factors [causing database vendors to automate their products]. One is if you look at how much data were accumulating, you cant keep up. Whatever the ratio is today of gigabytes to DBAs, if you try to keep that ratio constant and look over time at factors of 1,000 times more data, youd have to have the entire population of the earth be DBAs.
I have to find ways for a single DBA to manage more and more data and still get the same performance and optimization and ease of access that these automatic structures provide today.
Plus, many small businesses are moving away from file systems to a database. Were building a set of tools that can save DBAs in large enterprises a huge amount of time and also provide a very good managed solution thats entirely automatic for a small- or medium-sized business.
As we work with a number of software vendors to embed DB2 in their solutions, this is a natural outgrowth of that. In 1995, between zero percent and 3 percent of all the databases that we sold that year were sold by ISVs as part of their product sell. Were at the 40 to 50 percent mark now. Other people are leading and doing the sale, and DB2 is a part of that. This is a part of the evolution of our business model and the way our industry works.
People want to buy solutions, not a database engine. I have to make the database easier, or I cant make money in that marketplace.
Next page: Stinger pings the Linux 2.6 kernel.
Stinger Support for Linux
2.6″>
Another big thing in Stinger is the support for the Linux 2.6 kernel. Why is Linux so essential to IBMs future?
Weve been quite dedicated to making Linux successful and to making DB2 on Linux successful. From application development to working on open-source Linux itself, Linux runs on every one of our hardware platforms. This is a major direction for IBM and has been a major direction for DB2 since we produced the first clustered database for Linux.
We continue to support more processors and the newest versions of Linux as they come out. Linux is important for us because its important for our customer set. We see a lot of customers moving from Linux as their Web servers to Linux as their mission-critical systems.
I see a number of customers looking at Linux for their warehouses. That means, for us, Linux clusters. We have the ability to be very strong there. Our ICE [Integrated Cluster Environment] offering for Linux is one [for which] were seeing a lot of positive momentum. This builds on DB2 strengths of being able to scale to large numbers of nodes, between two and 1,000.
Whats going on with content management?
The natural direction for customers to go is not only to store XML documents in a DB2 database but to look for content solutions built on this technology. IBMs Content Manager is built around DB2 as the card catalog, so to speak.
What we see are directions where customers are expecting more and more content management out of database engines. The XML content were building in will give them that ability, and were also working very actively to form the Java Standard JSR 170.
What I see here is that content management systems have been thought of as a different idea than database systems. Theyve been focused on classic bank-account, inventory types of ideas. With Content Manager, those two worlds will bridge. Users will be able to use Xquery as a query language to query about documents and use XML as a richer data type.
An easy way to think about it is to think of an application programmer who has to write a program to look at data. Id like them to use one interface for DB2 and get all that data, whether its in a content management system or in DB2 directly, and use one interface to do that, so they can build applications 50 percent faster than they have to today.
People complained about DB2s GUI back in Version 6. Theyre still grumbling a bit about Stingers command-line interface. Any plans to tweak that?
Command-line interface is very valuable to our customer sets who have built up a set of their own management tools or bought from a third party, where they run things in batches or scripts. If they have thousands of servers to manage, this is a very good way for them to do that. This isnt going away.
We also want to offer point-and-click levels of management as well, so you can, from a Web browser, do a lot of tasks that administrators have to do. I dont see that changing.