Web 2.0, SOA, and Web Services - eWeek



IBM, EU IMPACT Project Looks to Preserve Ancient Texts




IBM and the EU are collaborating on a massive effort to digitize rare historical texts using new text-recognition and crowd-computing technologies.

IBM and the European Union are teaming up to digitize a massive number of rare and culturally significant historical texts. The initiative, called IMPACT (IMProving ACcess to Text) will provide new technologies to institutions across Europe that will enable them to efficiently digitize their historical collections, which will become available, editable and searchable online.

At the core of IMPACT’s research is new Web-enabled adaptive optical character recognition (OCR) software. The software is equipped with crowd-computing technology, a model in which individuals enhance a product by providing their unique knowledge and expertise. The popular Website Wikipedia is an example of crowd-computing. Combined, crowd-computing and OCR will allow institutions to digitize idiosyncratic fonts, irregularities and vocabularies while reducing error rates by 35 percent and substitution rates by 75 percent.

"IMPACT is remarkable in that it not only allows these prominent centers of culture to ultimately bring people closer to perhaps-never-before-seen historically significant texts of heritage --  but because it actually allows these people to become part of the preservation process," Tal Drory of IBM Research in Haifa, Israel, wrote in a statement.

"IMPACT offers the first digitization system that combines the power of crowd computing with an adaptive optical character recognition (OCR) correction solution that can achieve excellent recognition rates across all kinds of documents – from the 15th century right up through the 19th century,” Drory added.

Today’s OCR engines work well with modern printed texts, but faded ink, historical typefaces and damage can lower recognition rates by 50 percent. Manual post-production review becomes necessary for historical texts, but it is time-consuming and inefficient. The IMPACT project aims to lower the need for manual review.

"The only way to make a large-scale digitization project work is to dramatically improve the quality of the initial OCR, and cut down post-processing tasks as much as possible," said Hildelies Balk, Head of European Projects at Koninklijke Bibliotheek and leader for the IMPACT consortium, in a statement. "With IMPACT, we're expecting to see remarkable increases in productivity in the digitization process."

A new collaborative correction system, designed by IBM, will allow volunteers across Europe to correct mistakes online. The technology simplifies and speeds up the correction process by allowing users to key in corrections. The system will also compile lists of questionable words, which volunteers will be able to accept or reject with just one keystroke.

A small book that would take four hours to input manually would take just 15 minutes with the adaptive OCR and collaborative technology.

With IMPACT, IBM and EU are further expanding their research partnership, which already includes more than two-dozen national libraries, research institutes, universities, and companies across Europe. Other companies are competing to create IT solutions for research institutions. Microsoft, for example, has built a searchable electronic archive for the state of Washington, and HP has created a digital library for the Massachusetts Institute of Technology.







 
 
>>> More Web 2.0, SOA, and Web Services Articles          >>> More By Rebecca Kutzer-Rice
 

FEATURED SPONSOR MESSAGE

Start the New Year with business intelligence—it’s a smart move

Join us on February 1 for an encore rebroadcast at either 5 am or 12 noon EST and discover how business intelligence (BI) supports companies in uncertain business and economic climates. Get expert advice on how to create a strategy that fits your organization's needs and budget and see how quickly it can pay for itself.

Click Here

Brought to you by


eweek digital



Advertisement
 
APPLY FOR A FREE 
SUBSCRIPTION BELOW:

>Try digital eWEEK
>Renew today
>Subscription help
>More FREE Subscriptions
First Name:Last Name:
Title:Company:
Address:City:
State:Zip Code:
Email:
eWEEK Quick LInks