Yahoo, Tata Deal Empowers Hadoop Developers

The search giant teams with the Indian company to collaborate on cloud computing research.

Yahoo and Computational Research Laboratories, a subsidiary of India-based Tata Sons, are jointly supporting cloud computing research around the Apache Hadoop open-source distributed computing project.

As part of the agreement, announced March 24, CRL will make available to researchers one of the world's top five supercomputers, which has substantially more processors than any supercomputer currently available for cloud computing research.


Click Here to Watch the Latest eWEEK Newsbreak Video

Company officials said the deal is a first in terms of the size and scale of the machine, and the first in making available a supercomputer to academic institutions in India. The Yahoo-CRL deal is aimed at leveraging CRL's expertise in high-performance computing and Yahoo's technical leadership in Apache Hadoop to enable scientists to perform data-intensive computing research on a 14,400-processor supercomputer.

Yahoo, MySpace and Google form the OpenSocial Foundation. Read more here.

CRL's supercomputer, known as the EKA, has 14,400 processors, 28 terabytes of memory, 140TB of disk space, a peak performance of 180 teraflops (or 180 trillion calculations per second) and sustained computation capacity of 120 teraflops for the LINPACK benchmark. EKA is expected to run the latest version of Hadoop and other Yahoo open-source distributed computing software, such as the Pig parallel programming language developed by Yahoo Research.

This announcement between Yahoo and CRL came on the eve of the first-ever Hadoop Summit, scheduled for March 25 at Yahoo's Santa Clara, Calif., facility.

"We have made our leadership in supporting academic, cloud computing research very concrete by sharing a 4,000-processor supercomputer with computer scientists at Carnegie Mellon University for the last three months," said Ron Brachman, vice president and head of academic relations for Yahoo. "With this supercomputing cluster, researchers were able to analyze hundreds of millions of Web documents and handle two orders of magnitude more data than they previous could."