IBM Open-Sources UIMA for Unstructured Text Analysis

By Lisa Vaas  |  Posted 2006-01-23 Print this article Print

IBM is open-sourcing UIMA, its search and text analysis technology that mines unstructured data to uncover hidden relationships and trends.

IBM on Jan. 23 plans to carry through with its promise to open-source search and text analysis technology that mines unstructured data—such as documents, images, comment and note fields, e-mail, and rich media such as video and audio—to uncover hidden relationships, trends and facts. IBM is handing code for the technology, called UIMA (Unstructured Information Management Architecture), over to, the worlds largest open-source development site.
The company plans to move the project to a full open-source community development model later in the year.
Nelson Mattos, IBM distinguished engineer and vice president of information and interaction, predicted that the impact of IBMs release of UIMA will be similar in magnitude to IBMs release of SQL as a standard for relational databases 30 years ago. IBM proposes open-source AJAX project to Eclipse. Click here to read more. "The moment SQL became a standard and became highly adopted in industry, it opened the door for development of huge numbers of applications," Mattos said. "Were seeing exactly the same pattern here," Mattos continued. "Today 80 to 85 percent of data is unstructured data. There is no standard to deal with unstructured data, to build applications, to leverage that. UIMA has the potential to be that standard. "IBM was doing similar moves in the 1970s, when we gave SQL to the standards bodies. Were giving UIMA to open source hoping we can create a standard for a whole new generation of applications." UIMA already has solid traction, Mattos said. Unveiled by IBM in December of 2004, its already in use in industry and in academia. For example, the Mayo Clinic has adopted the framework as part of its collaboration with IBM on the processing of unstructured text—in particular, a collection of 20 million clinical notes. UIMA serves as the thread to stitch together the series of tools required to search and mine disparate unstructured data sources. Thus, Mayo Clinic has combined a series of its own, IBMs and open-source annotators in a plug-and-play fashion using UIMA as a framework. DARPA (the Defense Advanced Research Projects Agency) is also making use of UIMA. The agency is using it as part of a human language technology research and development program called GALE (Global Autonomous Language Exploitation), the goal of which is to analyze and interpret large volumes of speech and text in multiple languages. UIMA is also increasingly being used in software, with UIMA-compliant solutions now out from companies including ClearForest, Cognos, Factiva and Nstein. Source code for the IBM reference implementation of UIMA is available here. IBMs UIMA SDK can be downloaded for free at this site. Check out eWEEK.coms for the latest database news, reviews and analysis.
Lisa Vaas is News Editor/Operations for and also serves as editor of the Database topic center. Since 1995, she has also been a Webcast news show anchorperson and a reporter covering the IT industry. She has focused on customer relationship management technology, IT salaries and careers, effects of the H1-B visa on the technology workforce, wireless technology, security, and, most recently, databases and the technologies that touch upon them. Her articles have appeared in eWEEK's print edition, on, and in the startup IT magazine PC Connection. Prior to becoming a journalist, Vaas experienced an array of eye-opening careers, including driving a cab in Boston, photographing cranky babies in shopping malls, selling cameras, typography and computer training. She stopped a hair short of finishing an M.A. in English at the University of Massachusetts in Boston. She earned a B.S. in Communications from Emerson College. She runs two open-mic reading series in Boston and currently keeps bees in her home in Mashpee, Mass.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel