Google has acquired ReCAPTCHA, an open-source CAPTCHA service that the search engine giant will use to bolster security and its efforts to digitize books and newspapers.
CAPTCHA technology is widely used to fight spammers by preventing them from using computers to automatically sign up for Webmail accounts or other online services. This is where ReCAPTCHA comes into play. Its technology uses CAPTCHAs based on words from scanned archival newspapers and old books, something the company says works well because machines have a difficult time recognizing the words due to the degradation of paper and ink over time.
Each word that cannot be read correctly by OCR (optical character recognition) is placed on an image and used as a CAPTCHA.
“Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text,” wrote ReCAPTCHA co-founder Luis von Ahn and Google Product Manager Will Cathcart in a Google blog post.
Ahn and Cathcart continued, “This technology also powers large-scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we’ll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.”
“Improving the availability and accessibility of all the information on the Internet is really important to us, so we’re looking forward to advancing this technology with the ReCAPTCHA team,” the Google post concluded.
Financial terms of the deal were not disclosed.