IBM Launches Apache Spark-Based Data Science Experience - Page 2

“With Apache Spark, we see an opportunity to drive significant innovation into the community to benefit data engineers, data scientists and application developers,” Bob Picciano, senior vice president of IBM Analytics, said in a statement. “Our IBM Analytics platform is designed for blending those new technologies and solutions into existing architectures. It’s ready-made to take advantage of whatever innovations lie ahead as more and more data scientists around the globe create solutions based on Spark.”

Indeed, one such upcoming innovation from IBM, or another “drumbeat,” will come later this year when IBM delivers a new platform for enterprises to be able to consume the insights generated from using the Data Science Experience. This is one of a series of announcements that IBM will be making to be able to build out a broader vision to help clients realize more value from data, Gunnar said.

"The Data Science Experience qualifies as the next step in IBM's long-term investments in Apache Spark and should pay substantial dividends for critical Spark constituencies," said Charles King, principal analyst at Pund-IT.

In a press release, IBM said the company forges partnerships with data science organizations, such as Galvanize, Lightbend and RStudio. In addition, IBM has built Spark into the core of its platforms, including Watson, IBM Commerce, as well as its analytics, systems and cloud solutions as well as and more than 30 offerings, such as IBM BigInsights for Apache Hadoop, IBM Analytics on Apache Spark, Spark with Power Systems, Watson Analytics, SPSS Modeler and IBM Stream Computing. IBM also open-sourced its SystemML machine learning technology to advance Spark’s machine learning capabilities in 2015.

Users such as USA Cycling are employing IBM’s Spark technology. USA Cycling Women's Team Pursuit is using IBM Spark, Watson IoT, as well as IBM mobile and cloud solutions to gain insights for training strategies and racing tactics. The team can now get advanced analysis of rider data, calculate dynamic race positioning and determine the grouping of riders over the race track.

Meanwhile, IBM also continues to grow its analytics ecosystem and has contributed to related projects, including Apache Toree, EclairJS, Apache Quarks, Apache Mesos and Apache Tachyon (now called Alluxio), and providing major contributions to Apache Spark sub-projects SparkSQL, SparkR, MLLib and PySpark with more 3,000 total contributions in the last year, IBM officials said.

“Just as IBM played a critical role in the development of computer science, we can see many similarities today,” Picciano said in a statement. “Computer science went mainstream with the introduction of the PC. With data science, the major roadblock is having access to large data sets and having the ability to work with so much data. With today’s announcement, clients can have both.”

Indeed, this IBM move is about making data science available for the masses, Gunnar said. “This is about enabling a foundation to the masses that then bridges to a cognitive system,” she noted. “We fully anticipate through this offering being able to grow the number of data science professionals that are out in the market.”

IBM is trying to transform how companies use data all across their organizations. The company is trying to enable organizations to take data from IT and be able to activate the developer, the data science professional and the line-of-business professional to have data at their fingertips for them to make decisions that transform the business in ways they didn’t think about doing before.

“That means that you have to be more agile and collaboration is key,” Gunnar said. “This notion of team sport is pivotal to that—through things like Watson Analytics and the Data Science Experience.”