Intel Releases Hadoop Distribution for Big Data

By Jeffrey Burt  |  Posted 2013-02-26

Intel Releases Hadoop Distribution for Big Data

Intel is releasing its own distribution of Apache Hadoop, a move that not only helps push forward its software ambitions, but also will make it a significant player in the booming area of big data and help fuel sales of its Xeon server processors.

During a Webcast press conference Feb. 26, Boyd Davis, vice president and general manager of Intel's Data Center Software division, said the giant chip maker has been working with Hadoop since 2009—this is actually Intel's third release of its Hadoop software—and has been an open-source advocate for much longer than that. However, for Intel to be more than an outside influence in the big data space, it had to become a player with products to offer, Davis said. The Intel Distribution for Apache Hadoop was a way to do that.

Intel's legacy in high-end data center hardware—including its Xeon server chips and recent offerings around solid-state drive (SSD) memory—and newer efforts in software give the company a strong silicon-based foundation for launching a Hadoop distribution, he said. Intel is optimizing Hadoop to work with features on its chips, such as incorporating Advanced Encryption Standard New Instructions (AES-NI) for accelerating encryption into the Hadoop Distributed File System.

Intel, which has been building up its software capabilities via in-house development and acquisitions, will keep open parts of its Hadoop distribution—making them interoperable with other Hadoop distributions—but will keep some features, including management and monitoring capabilities, to itself. Intel will not open source such software as Intel Manager for Apache Hadoop—for configuration and deployment—or Active Tuner for Apache Hadoop, a tool for improving the performance of compute clusters running the distribution.

What Intel's Hadoop distribution will do is give organizations the confidence that comes when a major tech player supports an open-source technology, providing a "consistent, stable foundation" for the open-source software, Davis said, adding that Intel wants "to make sure Hadoop stays on the leading edge." More vendors, from established players like EMC to smaller companies like Cloudera, are coming out with their own Hadoop offerings.

Big data is becoming a growing trend in the business world, with a staggering amount of data being created from the wide range of devices and machines people are using. Davis pointed to numbers indicating that every 11 seconds, a petabye of data is created around the world.

"We're in an era of generating huge amounts of data," he said, noting that "the key is how to get value out of the data."

Hadoop, which includes about a dozen open-source projects, is designed to enable businesses to more easily do just that: store huge amounts of data, analyze it and leverage it in ways that benefit both the organizations and their end users. For example, businesses can use it to gain a better understanding of what their customers want, while medical researchers can more quickly discover life-saving drugs and communities can improve their environments by better managing traffic patterns.

"Big data has the potential to not only transform business models ... but has the ability to transform society," Davis said.

Intel Releases Hadoop Distribution for Big Data

Intel's move comes the same week that other players have made significant advances in big data and Hadoop. Hewlett-Packard announced a Hadoop plug-in for its ArcSight security software that will make it easier and faster for organizations to run through huge amounts of security data. Hortonworks' new beta of its Hadoop Data Platform will run on Microsoft's Windows Server, and EMC announced Feb. 25 a new Hadoop distribution, Pivotal HD, that works closely with the storage vendor's Greenplum massively parallel processing (MMP) database.

Davis said Intel will leave much of the application work to its partners, but that the chip maker will create a foundation for Hadoop that will enable organizations to leverage the capabilities in its data center hardware. Intel's AES-NI technology will enable up to 20 times the encryption speed of other technologies, while Intel's SSD and cache acceleration will offer queries in Hive—the data warehouse system in Hadoop—that are 8.5 times faster. The combination of Intel's silicon and its Hadoop distribution means that analyzing a terabyte of data, which normally would take as long as 4 hours, can now be done in 7 minutes, according to Intel.

During the Webcast, Intel offered a long list of partners to help integrate its software into various platforms, including Cisco Systems, Cray, Dell, Infosys, NextBio, Red Hat, SAP, SAS, Savvis and Teradata. SuperMicro announced Feb. 26 that it was adding Intel's Hadoop distribution to some of its servers and storage systems aimed at big data environments.

Intel's investment arm, Intel Capital, also is investing in smaller big data companies, such as 10gen and Guavus Analytics.

The strong move into big data also will help fuel sales of its Xeon chips, driving organizations to run their big data workloads on Intel-based servers from the likes of HP and Dell. Davis said that "one of [Intel's] biggest motivators is to drive faster growth of the data center."

Rocket Fuel