Apache Spark Is Creating a Buzz in Analytics, Data Processing Camps

By Darryl K. Taft  |  Posted 2015-01-19

Apache Spark is a rapidly evolving open-source engine for large-scale data processing and analytics. In development for a number of years at UC Berkeley's AmpLab, it is now being driven by Databricks, a Berkeley spin-out founded by Ion Stoica and Matei Zaharia. It is also reaching a level of maturity that moves it beyond pure experimentation—with imminent availability of a stable 1.0 release and inclusion, current or planned, in all major Hadoop distributions. There's good reason for all of the interest. Spark accelerates analytics on Hadoop, working as a full suite of complementary tools, including a fully featured machine learning library (MLlib), a graph processing engine (GraphX) and stream processing. Spark can access data in a variety of sources, including HDFS, Cassandra and HBase. The following eWEEK slide show, based on our own reporting and input from Peter Schlampp, vice president of product at Platfora, shares the different reasons some say Spark is the best thing to happen to data.


Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel