SHARE

Apache Spark Is Creating a Buzz in Analytics, Data Processing Camps

Written By

Jan 19, 2015

2 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Apache Spark Is Creating a Buzz in Analytics, Data Processing Camps
Spark Loves Diversity
Spark Helps Data Play Well with Others
Spark Goes Beyond SQL
Spark Saves Time and Money
Spark Is Easy to Program
Spark Has Automated Data Preparation
Spark Is Winning

Apache Spark Is Creating a Buzz in Analytics, Data Processing Camps

1 - Apache Spark Is Creating a Buzz in Analytics, Data Processing Camps

by Julia King

Spark Loves Diversity

2 - Spark Loves Diversity

Spark comes with a diverse set of libraries “out of the box,” making previously inaccessible processing models easy-to-use and providing a full suite of complementary tools, including a fully featured machine learning library (MLlib), a graph processing engine (GraphX), and stream processing. Vendors have been talking about the usefulness of these tools for years, but surveys show that only a small number of enterprises are actually using them.

Spark Helps Data Play Well with Others

3 - Spark Helps Data Play Well with Others

Apache Spark makes big data analytics more accessible to the business users who once required IT support at most stages of the development process. For data scientists and analysts, Spark supercharges their productivity—87 percent of whom recently tied data analysis to business success.

Spark Goes Beyond SQL

4 - Spark Goes Beyond SQL

Big data analytics is more than SQL SELECT statements. There has been a rush to provide SQL-on-Hadoop products with the notion that connecting yesterday’s tools to this new infrastructure was enough. Using Apache Spark, companies can access analytics with the flexibility of advanced APIs and have access to all the data in Hadoop and other sources.

Spark Saves Time and Money

5 - Spark Saves Time and Money

Spark is making the most time-consuming parts of data analysis easier—running programs up to 100 times faster than MapReduce in memory and up to 10 times faster on disk. This can be very beneficial at different stages of the analyst workflow. In addition, Spark is natively built to run in-memory, allowing it to support iterative analysis and more rapid, less expensive data crunching.

Spark Is Easy to Program

6 - Spark Is Easy to Program

With support for Python, Scala and Java, Apache Spark is a platform that is easy to program. This also means that it is easier to find people inside or outside of your enterprise who can use Spark, lowering the bar to make data scientists.

Spark Has Automated Data Preparation

7 - Spark Has Automated Data Preparation

Spark is accessible enough that data scientists can sift through data in Hadoop directly from their laptops if they wish. The in-memory caching ability of Apache Spark allows flexibility for data scientists who want to efficiently work on data sets in RAM but store it on disk—a slower but cheaper alternative.

Spark Is Winning

8 - Spark Is Winning

Over the past 12 months, there has been a flood of developers working on and committing to Apache Spark open-source projects. The best minds from companies like Adobe, IBM, Intel, Platfora, Yahoo and many others all contribute to Apache Spark at the expense of other projects that were active just a few quarters ago. Enterprises always look for technology that is going to win and Spark is winning.