Gesture Recognition Not Ready for Prime Time

 
 
By Peter Coffee  |  Posted 2006-02-27 Email Print this article Print
 
 
 
 
 
 
 

Opinion: Don't make complexity elegant; make it disappear.

As this month began, it became public knowledge that Apple has filed two patent applications under the title "Gestures for touch sensitive input devices." That title, of course, is wrong: The "invention" is not the act or manner of gesturing but, rather (as the applications later make clear), "methods and systems for processing touch inputs," specifically including recognition of "multipoint gestures."

I get a horrible sinking feeling when I try to wade through the prose that describes this putative innovation. Current recognition methods for "advanced gestures," the inventors assert, have "several drawbacks. ... Simply put, the user cannot change gesture states midstream ... multiple gestures cannot be performed simultaneously."

Click here to read more about Apples touch-screen gesture patent.
I have news for the inventors: Gesture recognition is the least of the problems that have to be solved before its safe to build machines that respond to simultaneous gestures or to other complex input. Anyone designing an interface, gestural or otherwise, should think several times before deciding that the solution is to make complex acts less difficult to perform. Machines do what users tell them to do, not what users really want, and most people arent good at composing complex requests.

Id rather see interface efforts based on watching what users do, understanding common needs and designing systems in which those actions are simple. Making complexity elegant is an achievement, but Id rather just make that complexity invisible.

Imagine, for example, two ways of saving a users work. This requires selecting one of several available storage devices, locating an available space of sufficient size on that device, writing data to that location and posting a record of that action so that the user can later retrieve the data just stored.

I might find someone who could devise a gestural language to perform this series of actions in a few fluid motions of the hand. I can imagine, for example, raising my hand and fanning out the fingers to say "show me my device environment," pointing with one finger to say "select that device," making a spreading gesture with thumb and forefinger to indicate "I need this much space," clenching my fist to say "allocate that space," and making a movement as if writing with a pen to say "record that file description in the directory."

Every office would soon look like a mass Macarena. Or maybe the better image is a casting call for would-be sock-puppet performers.

Meanwhile, back in the real world, a user with real work to do has typed "Ctrl-S" to indicate "save this." The environment may detect that the work in progress was not yet named and pop up a window to let the user choose a location and type a document name—while the operating system allocates and registers the physical storage assignment. Alternatively, the environment detects that the user was working with an already-existing file and simply updates the contents of that file. Thats what I mean by making a common action simple instead of making a complex action "easier."

Gestural languages work for concrete metaphors such as "select this" and "move this." These are actions guided by Wetware 1000.0—that is, by nearly automatic behaviors shaped by thousands of generations of evolution. When things get a little more abstract, though, a user who can form a complex intention and signal it with multiple simultaneous gestures should apply for a job as an orchestra conductor—a task thats much more difficult than it looks.

As is too often the case, people may think that this capability is ready for prime time because theyve already seen it on the big screen. In the 2002 movie "Minority Report," a police detective played by Tom Cruise uses a gesture language based on work by MIT Media Lab researcher John Underkoffler. That work has led to a real-world company, G-Speak, which has clients such as Raytheon that are building systems designed to aid users dealing with information overload.

I wish them and their users every success. But I also hope that enterprise developers will strive to make things genuinely simpler.

Click here for reader response to this editorial. Technology Editor Peter Coffee can be reached at peter_coffee@ziffdavis.com.

Check out eWEEK.coms for the latest news, reviews and analysis in programming environments and developer tools.
 
 
 
 
Peter Coffee is Director of Platform Research at salesforce.com, where he serves as a liaison with the developer community to define the opportunity and clarify developers' technical requirements on the company's evolving Apex Platform. Peter previously spent 18 years with eWEEK (formerly PC Week), the national news magazine of enterprise technology practice, where he reviewed software development tools and methods and wrote regular columns on emerging technologies and professional community issues.Before he began writing full-time in 1989, Peter spent eleven years in technical and management positions at Exxon and The Aerospace Corporation, including management of the latter company's first desktop computing planning team and applied research in applications of artificial intelligence techniques. He holds an engineering degree from MIT and an MBA from Pepperdine University, he has held teaching appointments in computer science, business analytics and information systems management at Pepperdine, UCLA, and Chapman College.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel