Researchers in the machine learning community typically use massive datasets distributed across multiple servers in the cloud when training machines to interact with users in a more intuitive and independent manner. Google is testing a new collaborative machine learning approach in which the training data is spread across millions of individual Android mobile devices instead.
The Federated Learning approach, according to the company, allows for machine learning models to be trained from actual user interaction with their Android devices.
The approach enables machine learning systems to be trained more quickly from data on individual smartphones and tablets and with less power consumption compared with systems where the training data is stored in the cloud. Importantly, it also allows users to benefit immediately from improvements made to device machine learning models, Google researchers Daniel Ramage and Brendan McMahan said in the company’s Research Blog.
The company is currently testing the approach with the querying feature in Gboard, Google’s Keyboard for Android devices. When Gboard shows a user a suggested query, the phone will store information on the device about the context in which the query was suggested and whether the user clicked on the suggestion or ignored it. Federated Learning then processes the on-device history to suggest improvements to the manner in which Gboard makes a query suggestion the next time the user interacts with it, the two Google researchers said.
All changes that are made to the machine learning model on the device are then summarized and sent as an encrypted update back to Google’s machine learning servers in the cloud.
To protect user privacy, the servers are programmed to wait until they receive between hundreds and sometimes thousands of similar updates from other devices. The servers then decrypt and aggregate the updates and see if the data can be used to improve the overall shared machine model in the cloud.
“Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device,” McMahan and Ramage said. It decouples “the ability to do machine learning from the need to store the data in the cloud,” they said.
Implementing the approach has not been easy. For instance, machine learning algorithms are typically designed to run datasets that are partitioned in a homogenous fashion across multiple cloud servers. The algorithms are optimized to work with high-throughput and low latency network connections.
With Federated Learning, the training data is spread in an uneven fashion across millions of devices with high-latency and relatively lower bandwidth connections. Unlike cloud servers, not all of the mobile devices are also always available for training, the Google researchers said.
One approach that Google developed to address some of these challenges is called the Federated Averaging algorithm, which is designed to train machine learning systems with far less communication compared with typical systems. Google also had to develop a new approach to compress updates from individual user devices to reduce upload communication costs.
Google’s on-device training uses a scaled-down version of its TensorFlow machine learning technology. To minimize user disruption, Google has had to develop a way to ensure that on-device training only happens when the device is idle and on a free wireless connection, according to McMahan and Ramage.
“Applying Federated Learning requires machine learning practitioners to adopt new tools and a new way of thinking,” the two researchers said. But the potential benefits make the effort worthwhile, they noted.
Google is currently exploring using the same approach with other applications as well as including photo rankings and language models.