How Google Tackles Synonyms in the Search for AI

By Clint Boulton  |  Posted 2010-01-20

Google Jan. 19 said it has improved the way its search engine understands synonyms, a big step in the company's effort to make its search services think more like humans, or artificial intelligence, in the parlance of the computing industry.

Parsing synonyms is something that search engine startups such as Hakia, Yebol and Microsoft's Powerset (now powering Bing) also work on, under the banner of semantic search. The idea is to fine-tune search engines to distinguish among words with similar meanings.

Google search quality engineers have racked up more than five years of research leading to the company's "synonyms system" by which it "analyzes synonyms' impact and quality," wrote Google Software Engineer Steven Baker in a blog post Jan. 19. "Our systems analyze petabytes of Web documents and historical search data" to understand "what words can mean in different contexts."

The company has found that that "synonyms affect 70 percent of user searches across the more than 100 languages Google supports," Baker said.

"Enabling computers to understand language remains one of the hardest problems in artificial intelligence," he said. "The goal of a search engine is to return the best results for your search, and understanding language is crucial to returning the best results. A key part of this is our system for understanding synonyms."

Baker said a good example of this AI challenge would be helping Google's search engine distinguish between the words "pictures" and "photos," which often mean the same thing.

If a user searches for "'pictures developed with coffee' to see how to develop photographs using coffee grinds as a developing agent, Google must understand that even if a page says 'photos' and not 'pictures,' it's still relevant to the search," Baker said. See the example here.  

Google is also now putting search synonyms in bold lettering in its search results snippets to help search users understand why that result is shown, even if it doesn't contain the original search term. For example, for the "pictures developed with coffee" search, the title of the first result has the word "photos" in bold.

That's an easy example. Google also pointed to queries involving terms with more potentially more complex synonyms, such as "GM." See Google's parsing of the term here. As Baker explained:

"Most people know the most prominent meaning: General Motors. For the search [gm cars], you can see that Google bolds the phrase "General Motors" in the search results. This is an indication that for that search we thought "General Motors" meant the same thing as "GM." ... GM can mean George Mason in [gm university], gamemaster in [gm screen star wars], Gangadhar Meher in [gm college], general manager in [nba gm] and even gunners mate in [navy gm]."

How accurate is Google's treatment of synonyms? Baker said, "For every 50 queries where synonyms significantly improved the search results, [Google] had only one truly bad synonym."

Meanwhile, users who stumble across poor synonyms should know a couple things. One, the AI behind synonyms isn't perfect, and two, Google will not manually fix bad synonyms because it prefers to make iterative improvements to its search algorithms.

Baker invited users post questions at the Web search help center forum or to send them via Twitter with the hash tag #googlesyns. Users may also turn off a synonym for a specific term by adding a "+" before it or by putting the words in quotation marks.

Matt Cutts, one of Google's search quality engineers, cheered Baker's post and called for Google to provide more transparency into its search quality efforts. He also threw down the gauntlet to challenge search rivals such as Bing, noting:

"The truth is that Google does a lot more sophisticated stuff than most people realize. I'd say that Google does more with "semantics" and both document and query understanding than almost any other search engine."

Rocket Fuel