IBM is working on new search technology that may eventually give Google Inc. a run for its money, at least in the corporate space.
Officials at IBMs T.J. Watson Research Center here discussed with eWEEK this month how it is tackling the problem of understanding unstructured data. Using a combination of artificial intelligence techniques, IBMs UIMA (Unstructured Information Management Architecture) is the foundation for what Paul Horn, IBM senior vice president and director of research, calls “Google on steroids.”
UIMA uses what officials call a “combination hypothesis” to deliver knowledge and understanding to bulk amounts of unstructured data. The combination of technologies solves a search problem by using them in unison in a kind of brute-force attack.
Salim Roukos, manager of multilingual natural language processing research at IBM Research, in Hawthorne, N.Y., and an expert in machine translation, said, “Through UIMA, the components [to approach the problem] were more readily available. And what were working on now is extending UIMA to support an effective way of combining these components.”
IBM has developed three systems based on UIMA. The first, internally called Jedi, is a pure Java version of the framework; another is a C++ version. A third, which is the most likely to go into broader use and into products in some form, is called Web Fountain and uses a Web services approach.
Horn said Web Fountain “goes out on the Net, crawls around and reads text. It reads the text, understands the text and will tell you whats in the text.”
Web Fountain also features natural language functionality, which allows it to find correlated subjects. “Weve done this for a number of big companies; one is British Petroleum [plc.],” Horn said. “We went out and found out what people were saying about them.”
IBM isnt the only big player seeking to steal Googles thunder:
Check out this scoop
on Microsofts Web crawler.
UIMA is an example of how IBMs research arm works with its product and services divisions to turn out new offerings in each, officials said.
“Were using this technology to differentiate our consulting practices,” said Horn. “But the core search technology is going into our software products, such as the portal,” and other technologies.
UIMA and Customers
IBM first took UIMA technology out on a customer engagement and has since been perfecting it so that it not only serves as a competitive advantage for IBMs services consultants, but portions of the technology will also find their way into IBMs products. The first most likely area the technology will find a home in is IBMs Lotus Software divisions products, officials said.
Nancy Staisey, a partner in IBMs Business Consulting Services group who is based in Washington, said she is pitching Web Fountain to a client to help pull in more business and to better approach problems for current customers.
“It is interesting because we have an asset that can have a lot of different applications, like it can be used for tracking a firms reputation in the market to allowing a customer to have a pulse on what their customers are thinking and doing,” Staisey said.
IBMs Life Sciences practice, along with IBM Research, has also applied Web Fountain to the complex problems of genetics. “IBM has brought their global expertise to bear on trying to develop data integration strategies that allow us to assess complex and diverse data from environmental, genetic and genomic sources in a fashion we have not been able to accomplish,” said Bruce McManus, co-director of the iCAPTUR4E (Imaging, Cell Analysis and Phenotyping Toward Understanding Responsive, Reparative, Remodeling, and Recombinant Events) Center at the University of British Columbia, in Vancouver.