If Microsoft Research has its way, inexpensive 3D sensing and motion control may be right around the corner.
During SIGGRAPH 2014, the computer graphics conference that wrapped up Aug. 14, researchers at the Redmond, Wash.-based software maker and the Italian Institute of Technology showed off a novel, inexpensive way to turn a single consumer-grade camera into an instrument that can capture 3D data. The technology could make 3D-sensing, gesture-controlled PCs, tablets and smartphones commonplace.
Microsoft already had a hit on its hands with its hacker-friendly, relatively low-cost Kinect motion sensor, originally a peripheral for the Xbox 360 that can translate a user’s motions in 3D space into on-screen activity. Despite the Kinect’s popularity among tech enthusiasts and a growing assortment of gesture controllers from other companies, 3D-sensing technology has yet to trickle down to mainstream users.
In their technical paper, titled “Learning to Be a Depth Camera for Close-Range Human Capture and Interaction,” the researchers argued that while “depth cameras are becoming more of a commodity, they have yet to (and arguably will never) surpass the ubiquity of regular 2D cameras, which are now used in the majority of our mobile devices and desktop computers.” The group’s technology could potentially overcome this stumbling block.
“We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications,” they wrote.
In a YouTube video, the group demonstrated how a modified smartphone camera can be used to track the skeletal movement of a hand in real time. Likewise, a modified version of an off-the-shelf webcam was used to track faces and generate 3D reconstructions.
On the hardware front, only minor modifications are necessary. The team removed near-infrared (NIR) cut filters used on typical camera RGB sensors and added an infrared (IR) bandpass filter to block out unwanted wavelengths along with a ring of IR-emitting LEDs.
In terms of software, the researchers harnessed a software discipline of growing importance to Microsoft. They explained that depth calculations are performed “by a machine learning algorithm, and can learn to map a pixel and its context to an absolute, metric depth value.”
The approach enables more efficient 3D data capture over conventional methods. “As this is a data driven, discriminative machine learning method, it learns to capture any variation that exists in the data set, such as changes in shape, geometry, skin color, ambient illumination, complex inter-object reflections and even vignetting effects, without the need to explicitly formulate them,” added the researchers.
Essentially, the technology “turns any 2D camera into a cheap depth sensor for close-range human capture and 3D interaction scenarios,” stated the team. They caution that its uses are limited to close-range subjects. Although the technology “cannot replace commodity depth sensors for general use, our hope is that it will enable 3D face and hand sensing and interactive systems in novel contexts.”