2Spark Loves Diversity
Spark comes with a diverse set of libraries “out of the box,” making previously inaccessible processing models easy-to-use and providing a full suite of complementary tools, including a fully featured machine learning library (MLlib), a graph processing engine (GraphX), and stream processing. Vendors have been talking about the usefulness of these tools for years, but surveys show that only a small number of enterprises are actually using them.
3Spark Helps Data Play Well with Others
4Spark Goes Beyond SQL
Big data analytics is more than SQL SELECT statements. There has been a rush to provide SQL-on-Hadoop products with the notion that connecting yesterday’s tools to this new infrastructure was enough. Using Apache Spark, companies can access analytics with the flexibility of advanced APIs and have access to all the data in Hadoop and other sources.
5Spark Saves Time and Money
Spark is making the most time-consuming parts of data analysis easier—running programs up to 100 times faster than MapReduce in memory and up to 10 times faster on disk. This can be very beneficial at different stages of the analyst workflow. In addition, Spark is natively built to run in-memory, allowing it to support iterative analysis and more rapid, less expensive data crunching.
6Spark Is Easy to Program
7Spark Has Automated Data Preparation
Spark is accessible enough that data scientists can sift through data in Hadoop directly from their laptops if they wish. The in-memory caching ability of Apache Spark allows flexibility for data scientists who want to efficiently work on data sets in RAM but store it on disk—a slower but cheaper alternative.
8Spark Is Winning
Over the past 12 months, there has been a flood of developers working on and committing to Apache Spark open-source projects. The best minds from companies like Adobe, IBM, Intel, Platfora, Yahoo and many others all contribute to Apache Spark at the expense of other projects that were active just a few quarters ago. Enterprises always look for technology that is going to win and Spark is winning.