IBM is working on new search technology that may eventually give Google Inc. a run for its money, at least in the corporate space.
Officials at IBMs T.J. Watson Research Center here discussed with eWEEK this month how it is tackling the problem of understanding unstructured data. Using a combination of artificial intelligence techniques, IBMs UIMA (Unstructured Information Management Architecture) is the foundation for what Paul Horn, IBM senior vice president and director of research, calls "Google on steroids."
UIMA uses what officials call a "combination hypothesis" to deliver knowledge and understanding to bulk amounts of unstructured data. The combination of technologies solves a search problem by using them in unison in a kind of brute-force attack.
Salim Roukos, manager of multilingual natural language processing research at IBM Research, in Hawthorne, N.Y., and an expert in machine translation, said, "Through UIMA, the components [to approach the problem] were more readily available. And what were working on now is extending UIMA to support an effective way of combining these components."
IBM has developed three systems based on UIMA. The first, internally called Jedi, is a pure Java version of the framework; another is a C++ version. A third, which is the most likely to go into broader use and into products in some form, is called Web Fountain and uses a Web services approach.
Horn said Web Fountain "goes out on the Net, crawls around and reads text. It reads the text, understands the text and will tell you whats in the text."
Web Fountain also features natural language functionality, which allows it to find correlated subjects. "Weve done this for a number of big companies; one is British Petroleum [plc.]," Horn said. "We went out and found out what people were saying about them."
IBM isnt the only big player seeking to steal Googles thunder:
Check out this scoopon Microsofts Web crawler.
UIMA is an example of how IBMs research arm works with its product and services divisions to turn out new offerings in each, officials said.
"Were using this technology to differentiate our consulting practices," said Horn. "But the core search technology is going into our software products, such as the portal," and other technologies.