IBM Open-Sources New Search Technology

IBM Open-Sources New Search Technology

Written By
John Pallatto
John Pallatto
Aug 8, 2005
4 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

IBM plans to release as open-source a sophisticated new search and text analysis technology that is able to find relationships, trends and facts buried in a wide range of unstructured data, including e-mails, Web pages, text documents, images, audio and video.

Called the UIMA (Unstructured Information Management Architecture), the technology is able is able to go beyond the keyword analysis typically used by most search engines to discern the semantic meanings within text and other unstructured data, said Nelson Mattos, vice president of information integration with IBM in San Jose, Calif.

IBM implemented UIMA in its WebSphere Information Integrator OmniFind Edition as part of its enterprise search platform, which Mattos said was the first commercially available application for this technology. IBM announced UIMA at the start of the LinuxWorld Conference & Expo in San Francisco this week.

UIMA was the result of four years of development by IBM Research supported by The DARPA Advanced Research Projects Agency, which is the central research and development arm of the U.S. Defense Department.

/zimages/4/28571.gifClick hereto read about new spider technology in WebSphere Commerce 5.6.1 that is designed to efficiently index commerce Web pages that are updated frequently.

Major universities and private research organizations, including Carnegie Mellon University, Columbia University and the University of Massachusetts participated in the development of the technology and are now using UIMA in course work and research projects, according to IBM Officials.

BBN Technologies Inc., Science Applications International Corp., the Mayo Clinic and MITRE Corp. also contributed to the research.

“We are announcing that we are going to be open-sourcing that architecture to allow for a broad adoption in the marketplace,” Mattos said.

Releasing the UIMA technology as open-source code will make it easier for commercial, government corporate and academic software developers to produce extensions and applications for the search technology, Mattos said. IBM will benefit from this when it gets opportunities to provide the computing and networking infrastructure to support these applications, he said.

UIMA will be presented to the Open Source Technology Group and be made available through the SourceForge online developer community by the end of 2005. Developers can also download the UIMA framework for free from IBMs Alpha Works division.

The search technology is particularly valuable for business intelligence applications that sift through e-mails or electronic documents to reveal trends that would otherwise be hidden from basic keyword searches, Mattos said.

For example, UIMA can be used to search through call center reports on problems about particular product such as a car to reveal mechanical or maintenance problems, Mattos said.

Such searches may reveal a product quality problem earlier in the production cycle so changes can be made before it damages the produces reputation or sales, he said.

/zimages/4/28571.gifRead morehereabout the major Web search engines working on ways to ferret out more premium content that was locked away in Web sites that were restricted to paid subscribers.

IT also allows companies to analyze “sales verses maintenance cost of a product and realize that while you are doing very well selling certain products, the maintenance cost of those is very high” because there are so many complaints and service calls about them, said Mattos.

Offering UIMA as an open-source technology is a good move because it increases the chances that it can be accepted as an industry standard for searching and analyzing all types of unstructured data, said Dana Gardner, principal analyst with industry researcher Interarbor Solutions.

“There has been a mish mash approach to text analytics, and I think there is a real value to having an interoperable methodology” in the market that brings together many of the best ideas about analyzing unstructured data, Gardner said.

The search engines available today are able to find huge numbers of documents with keyword searches, but they are poor providing an overview of the information contained in those documents, Gardner said. “Weve had a bunch of trees, but no way of viewing the forest when it comes to text analytics.”

If UIMA is widely accepted as an industry standard, “it could allow for real-time analysis of an entire corporate intranet, which could be extremely powerful and allows for knowledge to be much more attainable, recoverable and actionable,” said Gardner.

Its also true that the technology could also be used as a powerful intelligence gathering tool by the National Security Agency or the Central Intelligence Agency to sift through e-mail messages, phone conversations, or many other kinds of data, Gardner observed.

However, “I think that the spooks at the NSA and that ilk probably have these kinds of capabilities already,” Gardner said.

UIMA will be much more valuable by taking out of the cloistered domain of intelligence and making available to the much larger domains of business,” he said.

/zimages/4/28571.gifCheck out eWEEK.coms for the latest news, views and analysis on enterprise search technology.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.