Google has made more of its extensive research around machine learning and computer vision available to the open source community.
The company this week publicly released an API that developers and researchers can use to explore a Google computer vision system for automatically detecting and correctly identifying multiple objects in a single image.
Google has been developing the object-detection system in-house for sometime and has created increasingly sophisticated machine-learning models for detecting objects in images.
The company currently uses the system in products such as its Nest Cam and for intelligently detecting street numbers and names in Street View and for the ‘similar items and style ideas’ feature in Google Image Search.
In making the system available to the broader research community via the TensorFlow Objection Detection API, Google wants to spur research and exploration around computer vision technologies, Jonathan Huang, a Google research scientist and Vivek Rathod, a software engineer at the company stated in a blog.
“Creating accurate [Machine Learning] models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field,” the two researchers wrote. “We invest a significant amount of time training and experimenting with these systems.”
That effort has yielded significant improvements in the system’s objection detection capabilities, which others can now access via the API. “We’ve certainly found this code to be useful for our computer vision needs, and we hope that you will as well,” the two researchers said.
The TensorFlow Objection Detection API was one of two, computer vision related technologies that Google released to the open source community this week. The other was MobileNets, a collection of mobile oriented computer vision models for TensorFlow.
TensorFlow is a machine learning technology that Google open sourced in 2015 to spur development activity around deep learning and machine learning applications.
MobileNets models are designed to deliver enhanced visual recognition capabilities on mobile devices, said Andrew Howard, senior software engineer and Menglong Zhu, software engineer at Google in a separate announcement.
A technology called Google Cloud Vision API currently gives developers a way to integrate powerful image analysis capabilities into their applications for uses like detecting individual faces in a photo, classifying images by category and reading printed words within an image.
MobileNets optimizes delivery of such capabilities on mobile devices with their relatively limited power and computational capabilities. MobileNets is designed to work around the resource constraints on mobile devices while also improving computer vision capabilities on them, the two Google engineers said.
“MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases,” Howard and Zhu said. Researchers and developers can use the technology to build sophisticated image classification, detection and segmentation capabilities for mobile environments.
Some example use cases for the technology include object detection in images, landmark recognition, classification of images by category and facial attribute recognition.