Meet Google Goggles, Augmented Reality Vector

By Clint Boulton  |  Posted 2011-03-20

Meet Google Goggles, Augmented Reality Vector

It's early days for computer vision software such a Google Goggles, which some analysts and even Google itself feel hasn't tapped its true potential.

One such role for Goggles could be as a vector for AR (augmented reality), which comprises the overlay of information on real-world views seen through a mobile phone's camera viewfinder.

Goggles is a visual search application that uses smartphone cameras to send image information to Google's computing clouds, then back to the users' phones to complete an action.

Users of Android and Apple iPhone smartphones can use the app to snap pictures of landmarks, books, CDs, wine bottles, art. Google has taught the app to recognize print ads, QR codes and barcodes, solve Sudoku puzzles and translate menus from one language to another.

But what if Google tweaked Goggles in such a way as to retrieve not just historic info from its search engine, but to overlay real-time information about things or even places when a user points the camera at an object?

Google Goggles Product Manager Shailesh Nalawadi said Google is considering different applications for AR.  

"When you do it well in current paradigm, it feels more real-time," Nalawadi told eWEEK. "AR is a user interface, user experience innovation. It's something we are looking to do as well, but at the right time."

It's one thing to whip up another newfangled piece of software, and quite another to find a practical use for it.

Nalawadi provided a hypothetical scenario where Google might use AR. For example, a mobile phone user could point his Android phone at a restaurant or bar across the street to learn menu, hours of operation, ratings, deals and other info.

Some AR browser makers, such as Wikitude and Layar, operate in this fashion. Google's core goal is rooted in search, so Nalawadi said Goggles needs to answer another question: what is the specific piece of information a user is looking for when they search with their mobile phone? 

Google Working on 3D Classifiers to Solve AR Challenge

"The challenge is to figure what is the most relevant thing for the user," Nalawadi said. "You could throw a lot of info on there, but it would confuse the user. You need to make sure you are sending right users the set of things with AR. These are the user experiences challenges that we haven't cracked."

Even if Google can solve the challenge of figuring user intent via the AR lens, Goggles needs a lot of work. Superfish CTO Joe Dew, whose company also makes computer visions software that competes with Goggles, told eWEEK, Goggles has yet to solve the problem of recognizing most 3D objects.

For example, he said if a user "takes a pair of scissors, put them on a white piece of paper and Goggles probably won't find it." This is because Goggles becomes confused by the two objects, the scissors and the paper.

Superfish is working on this problem, which Google has addressed for recognizing landmarks by applying a classic, if not crude two-dimensional approach.

Specifically, Google accounts for all of the fixed, finite camera angles picture takers employ when they snap pictures of, for example, the Eiffel Tower. Still, solving the 3D challenge with a 2D-based methodology is an approach Nalawadi acknowledged is hardly ideal.

Google is working on hierarchical classifiers -- essentially programming tools that help computer vision software distinguish between objects -- that what a user is looking at is a car as well as product verticals such as shoes, handbags and jewelry.

In time, a user will be able to snap a picture of a handbag on a rack of handbags with their mobile phone, using Goggles. Goggles will recognize that the user wants to learn more about the bag in the foreground and ignore all of the other bags and other external, peripheral distractions in the image and give you a match.

"We have a lot of PH.Ds looking at it to solve the problem in generic way so that we can train engines to recognize a large class of objects, and then train instances within the classes," Nalawadi said.

"Can we train a trainer with a set of images to understand what is a boot, a car, or earrings? It's not easy, but we feel that is the more generic approach."

Above and beyond that audacious goal, Google needs to fill out the long tail of search. For example, Google needs to be able to recognize an entire vineyard's product lineup instead of just 100 of the most popular wine bottles. 


Rocket Fuel