Google Sets Sights on Clustering, Translation

 
 
By Matthew Hicks  |  Posted 2004-10-07 Email Print this article Print
 
 
 
 
 
 
 

The search company's leading researcher previews its work on clustering entities and words to better glean users' intentions, and on using statistical machine translation to show Web pages in other languages.

SAN FRANCISCO—Google Inc. on Thursday gave a preview of its next steps to improve Web search, and clustering technology played a leading role. During a panel discussion of research lab leaders at the Web 2.0 conference here, one of Googles top researchers previewed the search companys work in clustering both entities and words as a way to better glean users intentions and distill information on the Web. Another space in Googles research net is statistical machine translation for turning Web pages into other languages, said Peter Norvig, director of search quality at Google.
"[Were] trying to go just beyond keywords and the linking structure of the Web, the innovation that we brought to search, and get behind the deeper meaning," Norvig said during his presentation.
Is Google ready to enter the browser market? Click here to read more. In clustering, Norvig demonstrated a six-month-old project called "named entities abstraction," where Googles researchers are analyzing the companys large Web index to extract entities—such as the name of a company—from the structure of content and then decipher their relationship to one another.
For example, Norvig said, researchers are looking for ways to break down sentences by looking for a phrase like "such as" and grabbing the names that follow it. The goal is to not only pull out the name but also its clusters, so that a name such as "Java" can be associated both with the computer language and with language in general, Norvig said. "We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically," Norvig said. With word clustering, the focus is on making the search engine better at understanding the multiple meanings of a word, Norvig said. Google started working on word clustering about three years ago. Apropos of the heated U.S. presidential election, Norvig demonstrated a prototype of word clustering with results both for President Bush and for his Democratic contender, Sen. John Kerry. Bush appeared in clusters for words around "president" and "White House," to name some examples, but the results drew laughter when he also appeared in descriptive categories such as "idiot" and "chimp." "This is what the Web says, not my opinion," Norvig said following the laughter. Kerry appeared within groups for "senator" and for his wife, "Teresa Heinz Kerry," as well as for "Bob Kerry," a former senator with whom some people may confuse him. None of the clustering approaches is publicly available, though Norvig said in an interview following the panel that they may become Google Labs betas in the future. Google Labs often prototypes features and services publicly that, sometimes, become new offerings. News alerts and Googles local search are among the labs graduates. "Certainly one application for clusters is in results pages, and it may be something we do at some time," Norvig said in the interview. A growing number of search startups have targeted the automatic clustering of search results. Vivisimo Inc., one of the best-known startups that recently launched Clusty search site, groups results gathered from other search engines into clusters, or categories, as a way of drilling down into results. While it might make sense for startups to deploy clustering technology today, Norvig said, Google still views the technology as too immature. It is most useful only for a small percentage of search results, he said, so Google is focusing on improving the technology and increasing its usefulness. "Our take is that the state of the art is not there yet," Norvig said. With machine translation, Google is bringing to bear its formidable Web index—which at last count included 6 billion documents, images and items—as well as its computing resources. Google is well-known for having one of the largest clusters of Linux-based servers, which number in the thousands. Google already provides a Web-page translation feature, but Norvig said it is based on technology from a third party. Its research project is based on homegrown technology that eventually could translate Web pages and links more automatically, he said. Check out eWEEK.coms Enterprise Applications Center at http://enterpriseapps.eweek.com for the latest news, reviews and analysis about productivity and business solutions.

Be sure to add our eWEEK.com enterprise applications news feed to your RSS newsreader or My Yahoo page

 
 
 
 
Matthew Hicks As an online reporter for eWEEK.com, Matt Hicks covers the fast-changing developments in Internet technologies. His coverage includes the growing field of Web conferencing software and services. With eight years as a business and technology journalist, Matt has gained insight into the market strategies of IT vendors as well as the needs of enterprise IT managers. He joined Ziff Davis in 1999 as a staff writer for the former Strategies section of eWEEK, where he wrote in-depth features about corporate strategies for e-business and enterprise software. In 2002, he moved to the News department at the magazine as a senior writer specializing in coverage of database software and enterprise networking. Later that year Matt started a yearlong fellowship in Washington, DC, after being awarded an American Political Science Association Congressional Fellowship for Journalist. As a fellow, he spent nine months working on policy issues, including technology policy, in for a Member of the U.S. House of Representatives. He rejoined Ziff Davis in August 2003 as a reporter dedicated to online coverage for eWEEK.com. Along with Web conferencing, he follows search engines, Web browsers, speech technology and the Internet domain-naming system.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...

 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel