The rate at which science and technology are improving the ability to extract data from photos and videos is truly astounding.
A few years ago, search engines, like Google Image Search, gained the ability to match color sets. You can upload or link to a picture and have Google show you a bazillion other images with the same colors in them.
More recently, Google demonstrated the ability to recognize the content of some photos. If you upload a picture into Google+, Google will automatically add a hashtag to it based on the content. It works much of the time, but not always.
Upload people pictures, and Google can tell if the people in the images are adults or children and if people are smiling or not smiling. If you upload two group shots of the same group and don’t have everyone smiling in each, Google will combine the images into a near-perfect group shot where everyone is smiling. It’s pretty amazing.
Facebook is no slouch, either, when it comes to processing photos—especially recognizing faces. Sometime earlier this year, Facebook’s DeepFace facial recognition research project crossed a threshold where it can now recognize human faces in photos as well as people can, more or less. (As of March, Facebook was 97.25 percent accurate while people are generally 97.53 percent accurate.)
Note that DeepFace is still in the lab, specifically the Facebook AI research group in Menlo Park, Calif. Facebook hasn’t publically deployed it yet, instead using a lesser facial recognition technology for everyday use.
Such ability isn’t exclusive to Facebook—numerous companies and multiple university research labs are developing similar capabilities, as are major law enforcement agencies like the FBI. Many of these technologies take different approaches.
Facebook’s is especially interesting and usable because it mimics the human brain, using some 120 million different parameters. By showing the system two photos of the same person, the software constructs a 3D model of the face, so it can recognize faces from any angle.
This technology is not widely applied, but it could be and almost certainly will be. Any image—a publicly posted smartphone picture, bank camera image, security camera, store camera, etc.—could be fed into the algorithm to discover your identity.
It’s safe to ssume that this is pretty much going to happen; any business or law enforcement agency or schmuck with a smartphone or Google Glass-like smart glasses will be able to instantly know who you are wherever you go.
Now, this week we’ve learned about some really incredible technology developed independently at Google and at Stanford (a university that’s less than six miles away—must be something in the water) as well as Baidu (China’s Google), University of California, Los Angeles, the University of Toronto and University of California, Berkeley.
All research groups are using neural net artificial intelligence to enable computers to understand what’s happening in a picture, although their methods vary.
Image Search, Analysis Emerge as Powerful Tools, Privacy Threat
In a nutshell, these systems identify objects in a photograph—say, a boy, a dog, a ball, a tree, a park, a bird, some clouds and so on—then use sophisticated artificial intelligence to understand that the boy is throwing the ball for the dog to chase in a park and that the bird isn’t involved in the main action of the photo.
Combine this technology with face recognition and anyone with access (which will be everyone) will be able to search the Web for people doing things or involved with or associated with some activity.
In addition to photographs, video is a rich source of actionable data. A company called Placemeter pays New York City users up to $50 per month to install their app in an old smartphone and point that smartphone at a city block.
The software identifies places and people (that is, it can tell people are humans, but can’t tell their identities), then counts how many people walk by, enter and exit stores and other data of value to retailers, city planners and others. After Placemeter extracts the numbers they’re looking for, the video is deleted.
Placemeter is just one example of a future industry where mass video and photo surveillance will be crowd sourced.
It’s easy to see where all this is going. Cameras are terrible for harvesting data when humans have to pour over them to interpret every picture and every scene. But cameras are the ultimate sensor when artificial intelligence can be applied to the task of looking at photos and videos to extract data from them.
Once that data is indexed, it becomes super easy to use it and zero in on the photos and videos—or just use the related data without ever referring to the images themselves.
As with all such technology, the outcome will be a mixed bag of good and bad. Obviously, this is powerful stuff (in a dystopian science fiction sense, minus the fiction part) for the computerized surveillance state and poses a threat to the privacy of every human being on the planet.
But it’s also powerful stuff that ordinary individuals will have some access to. You’ll be able to set up rules for your home security system that says: “Alert me when the UPS driver throws a package.” Or “send me video every time someone other than my family members enter the house.” You’ll be able to set up alerts that will send you a link when someone posts a very specific picture or video on the Web.
And robots will have access, too. The ability to recognize photos and videos combined with other artificial intelligence will enable robots to respond in a more human-like way.
Love it or fear it, this technology is happening—right now.