Gesture Recognition Not Ready for Prime Time

Opinion: Don't make complexity elegant; make it disappear.

As this month began, it became public knowledge that Apple has filed two patent applications under the title "Gestures for touch sensitive input devices." That title, of course, is wrong: The "invention" is not the act or manner of gesturing but, rather (as the applications later make clear), "methods and systems for processing touch inputs," specifically including recognition of "multipoint gestures."

I get a horrible sinking feeling when I try to wade through the prose that describes this putative innovation. Current recognition methods for "advanced gestures," the inventors assert, have "several drawbacks. ... Simply put, the user cannot change gesture states midstream ... multiple gestures cannot be performed simultaneously."

/zimages/6/28571.gifClick here to read more about Apples touch-screen gesture patent.

I have news for the inventors: Gesture recognition is the least of the problems that have to be solved before its safe to build machines that respond to simultaneous gestures or to other complex input.

Anyone designing an interface, gestural or otherwise, should think several times before deciding that the solution is to make complex acts less difficult to perform. Machines do what users tell them to do, not what users really want, and most people arent good at composing complex requests.

Id rather see interface efforts based on watching what users do, understanding common needs and designing systems in which those actions are simple. Making complexity elegant is an achievement, but Id rather just make that complexity invisible.

Imagine, for example, two ways of saving a users work. This requires selecting one of several available storage devices, locating an available space of sufficient size on that device, writing data to that location and posting a record of that action so that the user can later retrieve the data just stored.

I might find someone who could devise a gestural language to perform this series of actions in a few fluid motions of the hand. I can imagine, for example, raising my hand and fanning out the fingers to say "show me my device environment," pointing with one finger to say "select that device," making a spreading gesture with thumb and forefinger to indicate "I need this much space," clenching my fist to say "allocate that space," and making a movement as if writing with a pen to say "record that file description in the directory."

Every office would soon look like a mass Macarena. Or maybe the better image is a casting call for would-be sock-puppet performers.

Meanwhile, back in the real world, a user with real work to do has typed "Ctrl-S" to indicate "save this." The environment may detect that the work in progress was not yet named and pop up a window to let the user choose a location and type a document name—while the operating system allocates and registers the physical storage assignment.

Alternatively, the environment detects that the user was working with an already-existing file and simply updates the contents of that file. Thats what I mean by making a common action simple instead of making a complex action "easier."

Gestural languages work for concrete metaphors such as "select this" and "move this." These are actions guided by Wetware 1000.0—that is, by nearly automatic behaviors shaped by thousands of generations of evolution.

When things get a little more abstract, though, a user who can form a complex intention and signal it with multiple simultaneous gestures should apply for a job as an orchestra conductor—a task thats much more difficult than it looks.

As is too often the case, people may think that this capability is ready for prime time because theyve already seen it on the big screen. In the 2002 movie "Minority Report," a police detective played by Tom Cruise uses a gesture language based on work by MIT Media Lab researcher John Underkoffler. That work has led to a real-world company, G-Speak, which has clients such as Raytheon that are building systems designed to aid users dealing with information overload.

I wish them and their users every success. But I also hope that enterprise developers will strive to make things genuinely simpler.

/zimages/6/28571.gifClick here for reader response to this editorial.

Technology Editor Peter Coffee can be reached at

/zimages/6/28571.gifCheck out eWEEK.coms for the latest news, reviews and analysis in programming environments and developer tools.