Google Working with Wikipedia to Translate 'Smaller Languages'

 
 
By Clint Boulton  |  Posted 2010-07-15 Email Print this article Print
 
 
 
 
 
 
 

Google July 14 said it is working with Wikipedia contributors, translators and "Wikipedians" (I assume these are users) across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu.

While Wikipedia has the so-called common languages English, German and French covered with millions of articles, there is a paucity of pieces translated into the aforementioned smaller languages.

The motive is obvious; Google and Wikipedia have a shared interest in organizing information and content and making it easily consumable for Web users all over the world.

Google has made great progress with Hindi, arguably one of the larger smaller languages, but the method is not so obvious. Google apparently uses Google Trends to pinpoint content and then the Translator Toolkit to translate it for Wikipedia.

Google Product Manager Michael Galvez explained the company's method for picking which Hindi Wikipedia articles get translated:

First, we used Google search data to determine the most popular English Wikipedia articles read in India. Using Google Trends, we found the articles that were consistently read over time--and not just temporarily popular. Finally we used Translator Toolkit to translate articles that either did not exist or were placeholder articles or "stubs" in Hindi Wikipedia. In three months, we used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent.

Google then washed, rinsed and repeated for other smaller languages to bring its total number of words translated to 16 million.

Pretty impressive, right? Check out the graph for the number of non-stub Wikipedia articles by Internet users:

Translate graph.png

See more info here, which Galvez presented at Wikimania in Poland last week.

There was a time when information barriers were inherent and assumed thanks to language gaps. That time is coming gradually to a close, thanks to Google's translation efforts.

There's something exciting and a little scary about the universalization of Google and Wikipedia.

Of course, Google has a long way to go because machine translation is an imprecise practice and a tough nut to crack.

 
 
 
 
del.icio.us | digg.com
 
 
 
 
 
 

Submit a Comment

Loading Comments...

 
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel