IBM, EU IMPACT Project Looks to Preserve Ancient Texts

IBM, EU IMPACT Project Looks to Preserve Ancient Texts

Aug 26, 2010
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

IBM and the European Union are teaming up to digitize a massive number of rare and culturally significant historical texts. The initiative, called IMPACT (IMProving ACcess to Text) will provide new technologies to institutions across Europe that will enable them to efficiently digitize their historical collections, which will become available, editable and searchable online.

At the core of IMPACT’s research is new Web-enabled adaptive optical character recognition (OCR) software. The software is equipped with crowd-computing technology, a model in which individuals enhance a product by providing their unique knowledge and expertise. The popular Website Wikipedia is an example of crowd-computing. Combined, crowd-computing and OCR will allow institutions to digitize idiosyncratic fonts, irregularities and vocabularies while reducing error rates by 35 percent and substitution rates by 75 percent.

“IMPACT is remarkable in that it not only allows these prominent centers of culture to ultimately bring people closer to perhaps-never-before-seen historically significant texts of heritage — but because it actually allows these people to become part of the preservation process,” Tal Drory of IBM Research in Haifa, Israel, wrote in a statement.

“IMPACT offers the first digitization system that combines the power of crowd computing with an adaptive optical character recognition (OCR) correction solution that can achieve excellent recognition rates across all kinds of documents – from the 15th century right up through the 19th century,” Drory added.

Today’s OCR engines work well with modern printed texts, but faded ink, historical typefaces and damage can lower recognition rates by 50 percent. Manual post-production review becomes necessary for historical texts, but it is time-consuming and inefficient. The IMPACT project aims to lower the need for manual review.

“The only way to make a large-scale digitization project work is to dramatically improve the quality of the initial OCR, and cut down post-processing tasks as much as possible,” said Hildelies Balk, Head of European Projects at Koninklijke Bibliotheek and leader for the IMPACT consortium, in a statement. “With IMPACT, we’re expecting to see remarkable increases in productivity in the digitization process.”

A new collaborative correction system, designed by IBM, will allow volunteers across Europe to correct mistakes online. The technology simplifies and speeds up the correction process by allowing users to key in corrections. The system will also compile lists of questionable words, which volunteers will be able to accept or reject with just one keystroke.

A small book that would take four hours to input manually would take just 15 minutes with the adaptive OCR and collaborative technology.

With IMPACT, IBM and EU are further expanding their research partnership, which already includes more than two-dozen national libraries, research institutes, universities, and companies across Europe. Other companies are competing to create IT solutions for research institutions. Microsoft, for example, has built a searchable electronic archive for the state of Washington, and HP has created a digital library for the Massachusetts Institute of Technology.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.