Big Blue is on a content management roll, announcing Friday that none less than the U.S. Army has plugged into its DB2 Content Management software.
The customer win is a big one for IBM—as in, a multimillion-dollar deal—and it highlights much of the companys state-of-the-art search and content management technology: DB2 Content Manager, WebSphere, DB2 Information Integrator and Tivoli Monitoring. The technologies will be used to automate the Armys management of what has been a system of paper forms for the past 200 years.
But the best is yet to come, as IBM quietly plugs away at what Janet Perna told eWEEK.com is the first-ever hybrid database—one that will store XML natively and that gives users the ability to search both structured and unstructured data tucked away in forms ranging from e-mail to images.
Of course, Oracle Corp. begs to differ when it comes to the “first-ever” bit. Bob Shimp, vice president of technology marketing at the company, noted that Oracle databases have supported a native XML data type for several years.
Document storage company Iron Mountain Inc., for one, is running a 15-terabyte XML database based on Oracle databases, Shimp said. In addition, Release 2 of Database 10g will feature XQuery, the native XML search format, as a standard feature.
“There are a lot of arcane architectural subtleties when you get into database format,” Shimp said when attempting to explain the two database giants conflicting statements over who was first to feature native XML storage.
“In the end, those dont matter. The key test is, Can you put [data] in an XML format? Can you read it out in an XML format? Thats all the customers care about, and we can do that.”
In the meantime, IBMs currently unnamed software is now in alpha mode with several customers, according to Perna, who is general manager of IBMs Information Management group. IBM will take the software into beta testing in the second half of next year.
The native XML support will allow DB2 to store XML documents in their native XML form, as opposed to taking that data and trying to make it look like relational data in columns and rows. “What weve been building into DB2 is more and more analytical capabilities, like data mining algorithms” to enable the new capabilities, Perna said.
Not only will that strategy increase the speed of querying, but IBMs emphasis on search ultimately will enable an enterprise search capability that approximates and expands on the power of a search engine such as Google.
With the future database technology, Perna said users will be able to search within a document that contains, for example, a picture of a widget, a technical specification for what the widget is, the number of widgets in inventory level, their cost and other information.
“Thats what native XML enables,” Perna said. “Being able to search within the body, through XQuery or SQL. Users wont have to select [whether they search via XQuery or SQL], theyll be able to intermix those with either all XML or all XQuery. It will be invisible.”
IBM has been putting its money where its mouth is. The company has more than 300 developers working on search and content management research and development.
Its also been on a search/content management buying spree: In August, the companys Information Management Division announced plans to buy its sixth company, unstructured data integration company Venetica Corp.
Before that, IBM purchased BI tools vendor Alphablox Corp. in July 2004, document management software vendor Green Pasture Software Inc. in December 2003, information integration company CrossAccess Corp. in October 2003, Tarian Softwares records management software in November 2002, and Informix Softwares database business in 2001.
IBMs research arms also have been hard at work to lay the groundwork for the upcoming revolution in search. Its research projects in this area have included Clio, wherein IBMs Almaden Research Center has been working to develop a tool that will enable its Content Manager software to more easily index and search XML data.
Some 18 months ago, IBM disclosed that Clio would result in a tool called Cinnamon. Cinnamons raison dêtre has to do with current problems in placing queries to XML-tagged data. Querying has required proprietary code that either doesnt take full advantage of the XML format or cannot be used consistently, IBM executives said at the time.
In IBMs T.J. Watson Research Center, research has long been under way on Web Fountain, the Web services version of IBMs UIMA (Unstructured Information Management Architecture), a technology based on artificial intelligence techniques that goes out on the Internet, crawling around and reading text and then interpreting it.
IBM is not the only one whos hot on the search trail, either. Oracle unveiled the results of years of research into these same technologies at its OpenWorld conference earlier this month, including its Oracle Files 10g enterprise content management technology.
Indeed, anybody whos listened to sales spiels of the two database giants knows that theyre going after the same targets: customers who need to get a handle on data that doesnt neatly sit in the rows and columns of structured databases.
Some studies estimate that about 81 percent of what enterprise users do as part of core business processes still requires them to deal with physical or digital documents—documents that have been hard, if not impossible, to access, search through or store in traditional relational databases. Think of the paperwork that an insurance or financial services company still handles: claims processing, loan origination or new account on-boarding.
Andy Warzecha, an analyst at The META Group, based in Stamford, Conn., said there are two trends now driving enterprises to get a grip on that data. First, theres just far more stuff to deal with than ever, and its production isnt slowing down.
“While we have organizations struggling to decrease the processing times they go through in core business processes, the counterpoint is theres more stuff they have to look at in their core business processes,” Warzecha said.
But while its nice to work efficiently with less paperwork, the real driver behind getting a handle on search is the sharp increase in regulatory demands over the past 18 months, Warzecha said. Between Sarbanes-Oxley, HIPAA (Health Insurance Portability and Accountability Act), Basel II, OSHA (Occupational Safety and Health Administration) and other post-Enron inspired regulations, all of that paper, all of those images and all of that unstructured data now has risk and penalty tied to it.
“The first stuff [typically] subpoenaed is from the e-mail environment,” he said. “Theres stuff there that shouldnt be. Organizations that are supposed to be destroying things arent doing a good job at that.”
Niche companies have provided technology to search unstructured data, including Verity Inc., which recently partnered with Yahoo Inc. to deliver Web search results within its enterprise search platforms.
Autonomy Corp. and Mamma.com are two other players among many that recently announced new enterprise search moves.
And then there are technologies such as Software AG Inc.s Tamino XML Server that natively store XML. The problem there, Perna said, is that such products lack relational database support and therefore lack RDBMSes scalability, security and performance.
Beyond the appeal of a hybrid relational/XML database lies the appeal of going with a player whos been around for a long time, Warzecha said. “Yes, this has become a huge issue for organizations in terms of looking at how technology can help suspend some of the problems theyre facing,” he said.
“Translate that in terms of who is best-positioned to benefit from this … since this is fundamentally about risk over a long-term time, and you get into the issue of trust. Which vendors do you trust who have the capability of solving these problems with you? You find large-scale vendors around a long time are better-positioned versus small, privately held companies in this arena.”
Check out eWEEK.coms for the latest database news, reviews and analysis.