Research projects dating back to the 1950s have attempted to apply artificial intelligence to create machines that think—or at least behave as if they can.
The quest to build computers that think like humans has necessarily focused on words. The famous Turing Test, for example, is designed to prove a machine’s ability to act intelligently by responding to written questions like a person would.
In reality, the human mind is optimized for visual processing. So much of what makes us both intelligent and human is our ability to recognize patterns, objects and context from what we see.
Until very recently, computers couldn’t do much with pictures. Photos, for example, were nothing more than inert, useless files. Unless they were laboriously tagged or otherwise given manually entered notations of some kind, it was impossible for a machine to reveal anything about the content of a picture.
But that’s changing rapidly. Just this week, a wide range of major announcements reveal a bold new world of applications that show what kind of magic can happen when you apply artificial intelligence to the job of understanding photographs.
Suddenly, artificial intelligence engines can do all kinds of incredible things with photos.
Here’s what’s happening.
When Google rolled out its Google Photos in May, the press focused on the power of Google’s combination of AI with photo search. Google demonstrated (and users quickly confirmed) that searching for specific people shows photos all the way back to infancy. Dog breeds could be found by searching for the breed name. Types of food could be combined with names, such as “pizza with max” to locate specific pictures.
At the time of the Google Photos launch, the media broadly failed to appreciate how long Google had been working at this. Some of the search features had been available on Google+ for more than a year.
What’s new this week is that Google is open-sourcing the main part of its AI capability in the form of a platform Google calls TensorFlow.
Although Google’s TensorFlow isn’t the first open-source AI platform, Google’s is the one most closely associated with Google’s impressive photo search A.
The open-sourcing of TensorFlow means other companies, including Startups, can creatively combine AI with photos in ways Google may not have applied. Google isn’t sharing key aspects of its many AI technologies, including the ability to run across a large number of servers. Nor is the company sharing the troves of user data that help make it so powerful. But they are enabling unprecedented AI power previously unavailable to small startups.
Expect mind-blowing new applications based on TensorFlow to reach the market next year.
Facebook Photo Magic
Facebook started testing a new feature this week for its Messenger mobile app called Facebook Photo Magic. The opt-in app scans new pictures in your smartphone’s camera roll and processes them through the company’s facial recognition technology. Photo Magic identifies people in the photos who are also Facebook friends, and suggests that you share the photos with them.
The feature no doubt does double duty for Facebook. First, it encourages more sharing on Messenger. Second, it improves recognition. By merely using this feature, which is presented as a convenience, the user is actually confirming or rejecting Facebook’s AI matching of face to name under arbitrary lighting conditions, angles and other variables. The more pictures Facebook has of each person available to its AI, the better the recognition.
Surprisingly, Facebook’s so-called “face recognition” can recognize your face even if your face is hidden. The system also looks at hairstyle, posture, clothing and the shape of your body. (Note that it’s not clear that Facebook has already implemented this advanced system, but it is clear it’s collecting data for it from user photos.)
Artificial Intelligence Gives Photos New Life Online
Facebook Photo Magic expands the pool of photo data to collect beyond just Facebook, the social network, to Messenger, the messaging app, which improves the quantity of data. And Photo Magic encourages confirmation or rejection of the matches, which improves the quality of data.
It’s clear that Facebook’s ultimate goal is to be able to recognize anyone in any situation, even in bad lighting where faces aren’t visible. From there, future Facebook AI will no doubt scan and analyze the environment for marketable clues—for example, if certain people often appear in photos at baseball games, advertisers could use that information to target baseball fans even if the words in that person’s posts don’t reveal that special interest.
They also intend, no doubt, to further build social graphs by seeing who shows up in pictures together.
Microsoft Project Oxford
Microsoft announced updates this week to its Project Oxford, which is a collection of tools that enable developers to make use of Microsoft’s artificial intelligence systems via the company’s cloud platform, Azure.
The tools enable the application of AI to various things, including spoken language, video and other media types. But the most amazing and powerful of these features is that Project Oxford now enables developers to detect human emotion in pictures of people through the Project Oxford Face API.
So a photo of, say, five people processed through Project Oxford recognizes the faces in the photo and identifies the emotional expression of every single person in the picture—emotions like happiness, anger or disgust.
This capability brings the quality of human-like “understanding” of photos to a new level. When people look at a picture of other people, the most important attribute that viewers note is the emotional state of the person or group.
Pinterest Visual Search
Pinterest unveiled this week a brilliant new photo search feature that helps users find more information and even buy the products they see in pinned photos.
To use it, you select (by drawing a box around) any object in a photograph that’s been posted on Pinterest. The search tool then finds similar objects with similar patterns and colors and ideally one that links to a buyable pin, which is a post where you can buy the product.
The feature is based on deep learning AI from Berkeley Vision and Learning Center.
This application of photo AI is the beginning of what you might call a worldwide web of photos, where each object in every picture is linked to identical, similar or related objects.
An image recognition and visual search company called CamFind this year launched a public API called CloudSight.
The API enables developers to leverage CamFind’s artificial intelligence to analyze the content of photographs. And many such scans are highly specific, identifying the make and model of cars, for example, or dog breeds and specific types of foods. Once the objects in a photo are analyzed, a developer can use that information to harvest text-based information from the Internet.
Deepomatic developed a software-as-a-service-based smart search engine that the company says can identify all kinds of data from a photograph. Deepomatic specializes in fashion. It not only matches colors, patterns and other data, but also identifies the objects in a photo and matches them against a comprehensive database of fashion products.
Deepomatic’s Website claims that its technology imitates how the human brain takes visual information and uses that to understand concepts.
The Big Picture
When I consider this astonishing new world of ubiquitous, extensible, available artificial intelligence that can “understand” what’s happening in photographs, I’m struck by the incredible variety of what’s possible.
And this is only the beginning. Because most of this technology is being made available in most cases through an API, an open-source process or as a service, we’re on the brink of a world where photo AI is as common a feature as Web search. In order to truly mimic human intelligence, computers must get visual. And now they are.