Cray officials want to make it easier for organizations to run the Apache Hadoop platform on their XC30 supercomputers.
The company on Nov. 18 announced a new big data framework—the Cray Framework for Hadoop package—aimed at improving the efficiency and performance of XC30 supercomputers being used by organizations deploying Hadoop for scientific big data workloads.
The framework package includes everything from best practices to performance enhancements that will optimize Hadoop for the Cray supercomputers. The package includes upgraded support for data sets commonly found in scientific and engineering applications. In addition, there is more support for multi-purpose environments, meaning that organizations can use these same XC30 systems for compute- and data-intensive workloads as well as scientific big data tasks. Customers can leverage the Java-based MapReduce Hadoop programming model on the same supercomputers that run other high-performance computing languages and tools found in the Cray Programming Environment.
Cray a year ago launched the Intel-powered XC30 supercomputer, the first of the XC supercomputers that eventually will replace the current XE and XK brands, officials said last year. The XC30 systems are the foundation for the new Piz Daint system in Switzerland, which is the sixth-fastest supercomputer in the world, according to the Top500 list of the fastest systems.
There is a growing demand from organizations that want to deploy Hadoop for their scientific analytics jobs, but are finding that Hadoop currently can’t meet all of those demands, according to Bill Blake, senior vice president and chief technology officer at Cray.
“They find Hadoop isn’t optimized for the large hierarchical file formats and I/O libraries needed by scientific applications that run close to the parallel file systems, or leverage the types of fast interconnects and tightly integrated systems deployed on supercomputers for performance and scalability,” Blake said in a statement. “And they find it difficult to share infrastructure or manage complex workflows [that] span both scientific compute and analytics workloads, and [to be] able to integrate math models with data models in a single high-performance environment.”
Cray’s Hadoop framework is designed to address those issues, he said. The initial release of the framework and an optimized Performance Pack for Hadoop will be offered as free downloads. The Cray Framework for Hadoop broadens the company’s big data offerings, which include the company’s Sonexion storage systems, Cray’s Tiered Adaptive Storage and its cluster supercomputers for Hadoop.