Fujitsu Looks to Accelerate Deep Learning Workloads

The company has developed software that will speed up machine learning tasks that are spread out over multiple GPU-powered systems.


Engineers at Fujitsu Laboratories have developed new software that can speed up deep learning projects run over multiple GPUs.

According to Fujitsu Labs officials, tests have found that the software used with 16 and 64 GPUs are 14.7 to 27 times faster than using a single GPU to run deep learning workloads, with increases in learning speeds 46 percent (on 16 GPUs) to 71 percent (on 64 GPUs). This is important given the increasing popularity of deep learning, a subset of machine learning, which is foundational to the development of artificial intelligence (AI).

Machine learning essentially comprises two parts, training (where neural networks are taught object identification and other tasks) and inference (where they use this training to recognize and process unknown inputs). The use of deep learning techniques to train neural networks has grown over the past several years, helping to drive significant advances in such work as image and speech recognition and increasing the accuracy over other technologies, according to Fujitsu Labs officials. In addition, deep learning requires massive amounts of data for machine training, and GPUs—with their ability to process huge amounts of data in parallel—are better suited than CPUs.

A challenge has been finding efficient ways to run deep learning workloads across multiple GPUs in parallel, the officials said. Right now, the primary way it's done is to use multiple computers that are powered by GPUs, networked together and running in parallel. However, such arrangements are difficult to scale—the benefits of parallelization becomes increasingly more difficult to reach when the time it takes to share data between the computers grows, particularly when more than 10 systems are used in the network at the same time.

The software developed by Fujitsu is designed to overcome those limitations, the researchers said. They took the new parallelization technologies and applied them to the open-source Caffe framework for deep learning. The software enables users to reduce the time needed for R&D, which in turn will lead to improved learning models, they said.

Fujitsu Labs tested the software on the AlexNet neural network for image recognition, which produced the results that showed the improved learning speeds. Machine learning jobs that would take about a month on a single GPU-powered computer can now be processed in about a day by running it on 64 GPUs in parallel.

Fujitsu Labs has developed two new technologies, one software for supercomputers that executes communications and operations at the same time and in parallel, while the other optimizes the processing methods based on the size of the shared data and the sequence of the deep learning processing. Combined, the software reduces the waiting time between processing batches, the researchers said.

Fujitsu officials plan to commercialize the new technologies as part of the company's Human Centric AI Zinrai portfolio sometime during the current fiscal year. Researchers expect to improve the software in hopes of further increasing the speed of training workloads.

Machine learning and AI are key technologies in the increasingly digitized and automated way of life that comes with such emerging trends as the internet of things (IoT), cloud computing, data analytics and mobility. It's already being using in such capabilities as photo tagging and fraud detection and will play a central role in such areas as autonomous cars and robotics.

The goal is to create machines that can learn and base their actions on their experiences, similar to humans. Established tech vendors and a growing array of startups are creating products and technologies that will help drive the development of AI. Patrick Moorhead, principal analyst with Moor Insights and Strategy, has called AI an inflection point in the industry that companies need to grab on to or risk getting left at a disadvantage against competitors.

Hyperscale players like Google and Facebook are making significant strides in the creation of such products, while system and component makers are building out their capabilities in the field. GPU maker Nvidia has made machine learning and AI key parts of its strategy for the future. Intel this week announced its intention to buy AI startup Nervana Systems to grow its capabilities in the space.