IBM, Hortonworks Team Up to Make Data Science More Mainstream

Declaring data science the key to unlocking the value of big data, Hortonworks and IBM announced a partnership designed to appeal to data scientists as well as business professionals.

Big Data Science

SAN JOSE, CALIF – Hortonworks and IBM have formed a new partnership promising to extend access to data science and machine learning to more developers across the Apache Hadoop ecosystem.

The companies plan to combine the Hortonworks Data Platform (HDP) with the IBM Data Science Experience and IBM Big SQL into new integrated solutions created to help enterprises and other organizations better analyze and manage the growing volume of data they’re accumulated.

The announcement was made here June 13 at the DataWorks Summit (previously known as the Hadoop Summit).

Hortonworks CEO Rob Bearden said the deal builds on a long collaboration his company has had with IBM including a common distribution of Hadoop, the open-source, Java-based programming framework that supports extremely large data sets in a distributed computing environment.

As part of the deal, Bearden said Hortonworks will adopt IBM’s Data Science Experience platform and the two companies will have “a very deep co-engineering effort” going forward to “make sure all our releases are in lockstep.” Hortonworks will also adopt IBM’s Big SQL for complex queries.

For its part, IBM has standardized on the Hortonworks Data Platform (HDP) for big data solutions. “What this means to the industry is a highly concentrated focus to bring HDP forward to the broader community,” said Bearden.

The two companies are also teaming up to advance the development of Unified Governance (IBM BigIntegrate, IBM BigQuality and IBM Information Governance Catalog) on the Apache Atlas open platform. Atlas provides a governance platform for Enterprise Hadoop which is designed to make it easier for developers to model new business processes and data assets. 

Rob Thomas, vice president of analytics and data platform at IBM, followed Bearden on stage and noted the deal promises to expand the use of data science. For the past twenty years he said data science has mainly been the province of experts and used primarily behind the firewall.

“Now data is everywhere, so that’s not enough. Data science is a team sport. You need an environment where everyone can use it,” said Thomas, adding that open platforms will address the need that data scientists have to work in different environments, including the public cloud, a desktop PC or in a private cloud. 

“The world doesn’t have enough data scientists, so we’ve built education into the platform in a bar on the side (of the screen) so you can get training right there any time,” he added.

Companies have not necessarily been able to leverage the explosion of big data, because the analytical tools are too hard to use, expensive or slow to deliver meaningful results. These are issues that Hortonworks and others have strived to address by leveraging open source software like Hadoop.

Thomas said access to big data insights can help companies across a range of industries, such as identifying financial fraud, network intrusions, anomalies in manufacturing, patient diagnosis, energy demand forecasts and helping online retailers anticipate what products shoppers are most interested in.

“Open source allows us to move more quickly in an industry that doesn’t usually move that quickly,” said Dawn Douglass managing director for Hortonworks customer Black Knight Financial Services, a part of Fidelity. “Being able to add analytics to our legacy platform has been key to improving the customer experience so when the client calls we know what’s going on in their lives.”

In the future, she expects big data systems will greatly streamline complex financial transactions to the point where consumers might be able to complete a mortgage transaction in as little as a minute.

Another member of the Hortonworks customer panel, Keith Renouard, chief enterprise architect at Health Care Service Corp., said his company is moving much faster with open source to leverage the “treasure trove” of health care data it has accumulated than it could with earlier legacy systems.

He said HCSC has been analyzing big data to drive better health care outcomes for its 15 million members across five states, such as identifying communities that have a bigger potential for diabetes and other diseases.
“We are a data-driven company,” he said.

John Pressley, director of IT at Duke Energy, said his company’s move to Hadoop has helped it transform from an energy company, to a digital company that sells energy and other products.

Open source tools are now in standard use at Duke Energy for data collection and analysis. “Why? Because it’s not proprietary and it’s what everyone in college today has access to,” he said. “So when I hire them, I’m not starting from zero.”

Bearden said the name of the conference was changed from Hadoop Summit to reflect the importance of all data sources including IoT, streaming media and all kinds of business data.

Editor's Note: This article was updated to correct the spelling of Hortonworks CEO Rob Bearden.

David Needle

David Needle

Based in Silicon Valley, veteran technology reporter David Needle covers mobile, bi g data, and social media among other topics. He was formerly News Editor at Infoworld, Editor of Computer Currents...