Google Working on 3D Classifiers to Solve AR Challenge
"The challenge is to figure what is the most
relevant thing for the user," Nalawadi said. "You could throw a lot
of info on there, but it would confuse the user. You need to make sure you are
sending right users the set of things with AR. These are the user experiences challenges
that we haven't cracked."
Even if Google can solve the challenge of figuring user
intent via the AR lens, Goggles needs a lot of work. Superfish CTO Joe Dew, whose company also makes computer
visions software that competes with Goggles, told eWEEK, Goggles has yet to
solve the problem of recognizing most 3D objects.
For example, he said if a user "takes a pair of
scissors, put them on a white piece of paper and Goggles probably won't find
it." This is because Goggles becomes confused by the two objects, the
scissors and the paper.
Superfish is working on this problem, which Google has addressed for
recognizing landmarks by applying a classic, if not crude
two-dimensional approach.
Specifically, Google accounts for all of the fixed,
finite camera angles picture takers employ when they snap pictures of, for example, the
Eiffel Tower. Still, solving the 3D challenge with a 2D-based methodology
is an approach Nalawadi acknowledged is hardly ideal.
Google is working on
hierarchical classifiers -- essentially programming tools that help computer
vision software distinguish between objects -- that what a user is looking at
is a car as well as product verticals such as shoes, handbags and jewelry.
In time, a user will be able to snap a picture of a handbag on a rack
of handbags with their mobile phone, using Goggles. Goggles will recognize that the user
wants to learn more about the bag in the foreground and ignore all of the other
bags and other external, peripheral distractions in the image and give you a
match.
"We have a lot of PH.Ds looking at it to solve the
problem in generic way so that we can train engines to recognize a large class
of objects, and then train instances within the classes," Nalawadi said.
"Can we train a trainer with a set of images to understand
what is a boot, a car, or earrings? It's not easy, but we feel that is the more
generic approach."
Above and beyond that audacious goal, Google needs to fill out the
long tail of search. For example, Google needs to be able to recognize
an entire vineyard's product lineup instead of just 100 of the most
popular wine bottles.








