Apache Spark Continues to Gain Enterprise Traction
Zaharia told eWEEK Spark started out of a research project at the University of California Berkeley, where he was working with early users of MapReduce and Hadoop, including Facebook and Yahoo. He said he found some common problems among those users, chief among them being that they all wanted to run more complex algorithms that couldn’t be done with just one MapReduce step. “MapReduce is a simple way to scan through data and aggregate information in parallel and not every algorithm can be done with it,” Zaharia said. “So we wanted to create a more general programming model for people to write cluster applications that would be fast and efficient at these more complex types of algorithms.” Meanwhile, Spark users are becoming more diverse. Spark is breaking down technology barriers between data scientists and engineers, who are working collaboratively to solve data problems. Of those surveyed, 41 percent identified themselves as data engineers, while 22 percent of respondents identified themselves as data scientists. Spark users are solving a variety of problems in different languages -- Scala (71 percent), Python (58 percent), SQL (36 percent), Java (31 percent) and R (18 percent) -- all within the same framework. Business intelligence appears to be the most popular use case for Spark, with 68 percent of respondents saying they use Spark for BI. However, 52 percent use Spark for data warehousing, 48 percent to build recommendation engines, 40 percent for processing application and system logs, 36 percent for user-facing services and 29 percent for fraud detection and security.“The enthusiasm for big data is matched only by the pace of innovation,” said Nik Rouda, senior analyst at Enterprise Strategy Group, in a statement. “Many organizations are shifting to a ‘Spark-first’ strategy, recognizing its advantages of analytics versatility, development familiarity, superior performance, range of data sources supported, and deployment flexibility. The market will no doubt continue to evolve, but Spark has established considerable momentum today.”
In addition Spark is helping to increase access to big data. Spark adoption is growing so quickly because users are enjoying its ease of use, performance and the fact that it is aligned for future growth in real-time and advanced analytics, Databricks said. Ninety one percent of those surveyed claim performance as their reason for using Spark, while 77 percent cite ease of programming. Moreover, 71 percent cite ease of deployment, 64 percent cite advanced analytics capabilities and 52 percent cite real-time streaming capabilities as their reason for using the technology.