Results of a new survey indicate that the Apache Spark big data processing engine is gaining traction with a growing number of developers.
Typesafe, the company behind Play Framework, Akka, and Scala, has released the findings of a survey of more than 2,100 enterprise developers, data scientists, executives and system architects, analyzing adoption patterns of Apache Spark. The survey showed that 13 percent of developers said they are using Spark in production, while 31 percent said they are currently evaluating Spark.
Apache Spark is an open-source engine for large-scale data processing and analytics. It has been in development for a number of years at UC Berkeley's AmpLab, and is now being driven by Databricks, a Berkeley spin-out founded by Ion Stoica and Matei Zaharia. Zaharia is CTO of Databricks and the creator of Apache Sparks. Databricks worked with Typesafe on the survey.
"This survey of over 2100 developers alone highlights that over 500 enterprises are using or planning to use Spark in production in 2015, in environments ranging from Hadoop clusters to public and private clouds, with data sources including key-value stores, databases, streaming data and file systems," Zaharia said. "Their use cases range from batch workloads to SQL queries, stream processing and machine learning, highlighting Spark’s unique capability as a simple, unified platform for data processing."
Typesafe said Spark awareness and adoption are seeing hockey-stick-like growth. Google Trends confirms this finding and the survey shows that 71 percent of respondents have at least evaluation or research experience with Spark—up to 35 percent are using it or plan to adopt soon. Of the survey respondents running big data applications in production, 82 percent indicated that they are eager to replace MapReduce with Spark as the core processing engine.
"Coming directly from developers, this survey reiterated the rapid adoption of Spark for large-scale data processing,” Zaharia said in a statement. "I'm especially excited by the breadth of use cases seen, which range from batch jobs to streaming and machine learning. It's this type of direct feedback and dialogue with our community that enables us to continue to improve the usability, performance and built-in libraries of Spark."
For example, faster data processing and event streaming are the focus for enterprises. By far, the most desirable features are Spark's improved processing power over MapReduce—more than 78 percent of respondents mention this—and the ability to process event streams (66 percent), which MapReduce cannot do.
Moreover, the survey showed that perceived barriers to adoption are not major blockers to adoption. When asked, respondents mentioned lack of in-house experience and perceived immaturity of some Spark components and integrations with other middleware and management tools. Also cited are needs for better commercial support options and for more comprehensive documentation and advanced examples. Some respondents mentioned that their organizations aren't currently in need of "big" data solutions at this time.
"The need to process big data faster has largely fueled the intense developer interest in Spark," said Dr. Dean Wampler, Big Data Architect at Typesafe, in a statement. "Hadoop's historic focus on batch processing of data was well supported by MapReduce, but there is an appetite for more flexible developer tools to support the larger market of 'mid-size' datasets and use cases that call for real-time processing."