Google Jan. 19 said it has improved the way its search engine understands
synonyms, a big step in the company's effort to make its search services think
more like humans, or artificial intelligence, in the parlance of the computing
industry.
Parsing synonyms is something that search engine startups such as Hakia,
Yebol and Microsoft's Powerset (now powering Bing) also work on, under the
banner of semantic search. The idea is to fine-tune search engines to
distinguish among words with similar meanings.
Google search quality engineers have racked up more than five years of
research leading to the company's "synonyms system" by which it
"analyzes synonyms' impact and quality," wrote Google Software Engineer
Steven Baker in a blog post Jan. 19. "Our systems analyze petabytes of Web
documents and historical search data" to understand "what words can
mean in different contexts."
The company has found that that "synonyms affect 70 percent of user
searches across the more than 100 languages Google supports," Baker said.
"Enabling computers to understand language remains one of the hardest
problems in artificial intelligence," he said. "The goal of a search
engine is to return the best results for your search, and understanding
language is crucial to returning the best results. A key part of this is our
system for understanding synonyms."
Baker said a good example of this AI challenge would be helping Google's
search engine distinguish between the words "pictures" and
"photos," which often mean the same thing.
If a user searches for "'pictures developed with coffee' to see how to
develop photographs using coffee grinds as a developing agent, Google must
understand that even if a page says 'photos' and not 'pictures,' it's still
relevant to the search," Baker said. See the example here.
Google is also now putting search synonyms in bold lettering in its search
results snippets to help search users understand why that result is shown, even
if it doesn't contain the original search term. For example, for the
"pictures developed with coffee" search, the title of the first
result has the word "photos" in bold.
That's an easy example. Google also pointed to queries involving terms with
more potentially more complex synonyms, such as "GM." See Google's
parsing of the term here. As Baker explained:
"Most people know the most
prominent meaning: General Motors. For the search [gm cars], you can see that
Google bolds the phrase "General Motors" in the search results. This
is an indication that for that search we thought "General Motors"
meant the same thing as "GM." ... GM can mean George Mason in [gm
university], gamemaster in [gm screen star wars], Gangadhar Meher in [gm
college], general manager in [nba gm] and even gunners mate in [navy gm]."
How accurate is Google's treatment of synonyms? Baker said, "For every
50 queries where synonyms significantly improved the search results, [Google]
had only one truly bad synonym."
Meanwhile, users who stumble across poor synonyms should know a couple
things. One, the AI behind synonyms isn't perfect, and two, Google will not
manually fix bad synonyms because it prefers to make iterative improvements to
its search algorithms.
Baker invited users post questions at the Web search help center forum or to send them via Twitter with
the hash tag #googlesyns. Users may also turn off a synonym for a specific term
by adding a "+" before it or by putting the words in quotation marks.
Matt Cutts, one of Google's search quality engineers, cheered Baker's post
and called for Google to provide more transparency into its search quality
efforts. He also threw down the gauntlet to challenge search rivals
such as Bing, noting:
"The truth is that Google does a
lot more sophisticated stuff than most people realize. I'd say that Google does
more with "semantics" and both document and query understanding than
almost any other search engine."