Cray Leverages Intel Hadoop For Big Data in HPC

Cray’s new offerings will combine its CS300 supercomputer clusters with Intel’s Hadoop distribution.

Cray officials are adding Intel’s Hadoop distribution to their growing list of supercomputing solutions for the burgeoning big data market.

Cray later this month will launch cluster supercomputers for Hadoop applications that will combine the vendor’s CS300 supercomputers with Intel’s Hadoop distribution, a Linux operating system and Cray’s Advanced Cluster Engine (ACE) management software, according to company officials.

The result will be a turnkey computing infrastructure that will enable organizations to better leverage Hadoop, according to Bill Blake, senior vice president and CTO at Cray.

"More and more organizations are expanding their usage of Hadoop software beyond just basic storage and reporting,” Blake said in a statement. “But while they're developing increasingly complex algorithms and becoming more dependent on getting value out of Hadoop systems, they are also pushing the limits of their architectures."

The Cray Hadoop supercomputer clusters, which will be integrated, optimized, validated and supported by the systems maker, will enable to scale their Hadoop software, he said.

“Organizations can now focus on scaling their use of platform-independent Hadoop software, while gaining the benefits of important underlying architectural advantages from Cray and Intel," Blake said.

Big data is a growing trend in the business world, with massive amounts of data being created from the wide range of connected devices, machines and sensors. Intel officials have said that every 11 seconds, a petabyte of data is created around the world.

Hadoop, which includes about a dozen open-source projects, is designed to enable businesses to more easily store huge amounts of data, analyze it and leverage it in ways that benefit both the organizations and their users. For example, businesses can use it to gain a better understanding of what their customers want, while medical researchers can more quickly discover life-saving drugs and communities can improve their environments by better managing traffic patterns.

Intel in February unveiled the Intel Distribution for Apache Hadoop, its own distribution of the open-source technology. The giant chip maker had been working with Hadoop since 2009, but officials said it was important to offer a Hadoop distribution optimized to work with features on its processors, such as incorporating Advanced Encryption Standard New Instructions (AES-NI) for accelerating encryption into the Hadoop Distributed File System.

It’s also part of a larger effort by Intel to grow its role in the data center beyond server chips. Intel has been building up its software capabilities via in-house development and acquisitions, and while keeping open parts of its Hadoop distribution—making them interoperable with other Hadoop distributions—the company will keep some features, including management and monitoring capabilities, to itself. Intel will not open source such software as Intel Manager for Apache Hadoop—for configuration and deployment—or Active Tuner for Apache Hadoop, a tool for improving the performance of compute clusters running the distribution.

Cray officials, in announcing their new Hadoop clusters, noted the strengths in Intel’s distribution, including greater security, improved real-time handling of data, and enhanced performance throughout the storage architecture. Cray is including support for InfiniBand and improved resource management, officials said.

The CS300 series of supercomputers—which Cray inherited when it bought rival Appro for $25 million in November 2012—comes with an integrated high-performance computing (HPC) software stack and software tools that are compatible with most open-source and commercial compilers. That will enable organizations to leverage Intel’s Hadoop distribution, according to Girish Juneja, CTO and general manager of Intel's Big Data Software unit.

"Combining these features with the highly innovative HPC technologies in Cray systems will create a compelling solution for organizations with the most demanding Hadoop requirements," Juneja said in a statement.

Cray’s Hadoop supercomputer clusters, which offer energy-efficient water- or liquid-cooled architectures, are the latest move by the systems vendor to build out its portfolio of products for big data. The company also offers Cray Sonexion storage systems and YardData’s Urika appliance for graph analytics.