Yahoo and Computational Research Laboratories, a subsidiary of India-based Tata Sons, are jointly supporting cloud computing research around the Apache Hadoop open-source distributed computing project.
As part of the agreement, announced March 24, CRL will make available to researchers one of the world’s top five supercomputers, which has substantially more processors than any supercomputer currently available for cloud computing research.
Company officials said the deal is a first in terms of the size and scale of the machine, and the first in making available a supercomputer to academic institutions in India. The Yahoo-CRL deal is aimed at leveraging CRL’s expertise in high-performance computing and Yahoo’s technical leadership in Apache Hadoop to enable scientists to perform data-intensive computing research on a 14,400-processor supercomputer.
CRL’s supercomputer, known as the EKA, has 14,400 processors, 28 terabytes of memory, 140TB of disk space, a peak performance of 180 teraflops (or 180 trillion calculations per second) and sustained computation capacity of 120 teraflops for the LINPACK benchmark. EKA is expected to run the latest version of Hadoop and other Yahoo open-source distributed computing software, such as the Pig parallel programming language developed by Yahoo Research.
This announcement between Yahoo and CRL came on the eve of the first-ever Hadoop Summit, scheduled for March 25 at Yahoo’s Santa Clara, Calif., facility.
“We have made our leadership in supporting academic, cloud computing research very concrete by sharing a 4,000-processor supercomputer with computer scientists at Carnegie Mellon University for the last three months,” said Ron Brachman, vice president and head of academic relations for Yahoo. “With this supercomputing cluster, researchers were able to analyze hundreds of millions of Web documents and handle two orders of magnitude more data than they previous could.”
In November, Yahoo announced that it will be the first vendor to launch an open-source program aimed at advancing the research and development of systems software for distributed computing. Yahoo said its program is intended to leverage its leadership in Hadoop to enable researchers to modify and evaluate the systems software running on a 4,000-processor supercomputer provided by Yahoo.
Yahoo officials said that, at the time, their company was the primary contributor to Hadoop, an open-source distributed file system and parallel execution environment that enables its users to process massive amounts of data.
As a key part of the November program, Yahoo said it wanted to make Hadoop available in a supercomputing-class data center to the academic community. Called the M45, Yahoo’s supercomputing cluster, named after one of the best-known open star clusters, has about 4,000 processors, 3TB of memory, 1.5 petabytes of disks and a peak performance of more than 27 teraflops, placing it among the top 50 fastest supercomputers in the world.
M45 was set to run the latest version of Hadoop and other state-of-the-art, Yahoo-supported open-source distributed computing software. Yahoo officials said the company built the M45 from commodity hardware, but would not disclose the specific hardware vendor.
Doug Cutting started the Hadoop project. He is currently an employee of Yahoo, where he leads the Hadoop project full-time. The Pig language was created by a group of scientists in Yahoo Research: Ravi Kumar, Christopher Olston, Ben Reed, Utkarsh Srivastava and Andrew Tomkins, Brachman said.
In December, Yahoo announced that it had become a platinum sponsor of The Apache Foundation. Yahoo’s support of the ASF stems from its work with the Apache HTTP Server and Lucene projects. Several members of Yahoo’s development teams are active, long-term code contributors to Apache Hadoop, the open-source platform that makes it possible to efficiently process vast amounts of data on a cluster of commodity hardware, the company said.