How Apache Spark Is Transforming Big Data Processing, Development

By Darryl Taft  |  Posted 2015-08-30 Print this article Print
Spark Transformation

This solves a significant problem for Spark users. "Managing access to credentials and other sensitive information for every user on my team has been a big challenge," said Benny Blum, vice president of product and data science at Sellpoints, in a statement. "The ability to quickly and easily do so with the Databricks Access Control feature will enable my team to maintain the highest security standard.”

Enter Big Blue

Meanwhile, in June, IBM announced a series of moves to invest in and further commit to Spark as a centerpiece of its big data platform.

“IBM is building Spark into the core of our analytics and commerce platforms,” Joel Horwitz, director of the IBM Analytics Platform, told eWEEK.

“Additionally, we'll offer Spark as a Service on IBM Bluemix, host Spark applications and offer free Spark online courses to educate a million people worldwide. IBM will also offer enterprise level support and consulting to our clients. Spark enhancements will extend well beyond IBM Analytics into all parts of the business,” he said. Bluemix is IBM's cloud platform as a service for running and developing large scale applications.

IBM also said it will commit more than 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide.

Moreover, the company opened a Spark Technology Center in San Francisco for the data science and developer community.

Other large enterprises are putting Spark to work. Independence Blue Cross (IBC), a large health insurer in the Philadelphia area, serving more than 2 million people in the region and 7 million nationwide, is using Spark to develop new services.

"Apache Spark is quickly maturing into a power tool for development of machine-learning analytic applications,” said Darwin Leung, the company’s director of Informatics. “It allows our IBC researchers and academic partners to work together more seamlessly, which means we can get new claims and benefits apps up and out to customers much faster."

Findability Sciences, a consulting and contextual data technology company, is using IBM Analytics and Spark to help clients implement Big Data processing applications.

“Apache Spark with IBM BigInsights has given us tremendous capacity for our implementations for small and medium businesses, where MapReduce was not efficient,” said Anand Mahurkar, CEO of Findability Sciences, in a statement.

“With Spark, the performance has improved multifold. We’re now able to process streaming data from IoT [Internet of Things] devices and offer analytics for data in motion for things like traffic, commuters and parking.”

As Spark has origins in supporting machine learning research, Databricks has been intent on enhancing its machine learning capabilities. To that end, the company is working closely with IBM, which over the summer open-sourced its IBM SystemML machine learning technology.

The companies plan to introduce new domain specific algorithms to the Spark ecosystem and add new machine learning primitives in the Apache Spark Project. IBM and Databricks also will collaborate to integrate IBM's SystemML with the Spark platform.



Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel