Apache Spark Developer Adoption on the Rise

By Darryl K. Taft  |  Posted 2015-01-27 Print this article Print
Apache Spark

Indeed, "Compared to the MapReduce API, the Spark API is amazingly intuitive, providing concise, expressive operations that are often needed for analytics," Wampler added. "So, in addition to addressing a wider class of problems, Spark is improving the productivity of developers who use it."

Apache Spark is reaching a level of maturity that moves it beyond pure experimentation—with imminent availability of a stable 1.0 release and inclusion (current or planned) in all major Hadoop distributions. There's good reason for all of the interest. Spark accelerates analytics on Hadoop, working as a full suite of complementary tools including a fully-featured machine learning library (MLlib), a graph processing engine (GraphX) and stream processing. Spark can access data in a variety of sources including HDFS, Cassandra and HBase.

Developers across all industries have been turning to Typesafe to build Reactive applications, of which big data is a core component. Because it is built with Scala, it was a logical choice for Typesafe to add full lifecycle support for Apache Spark to the Typesafe Together Project Success Subscription program to accelerate developer adoption and success in building Reactive big data applications.

According to the survey the top three languages used with Spark are Scala (88 percent of respondents), Java (44 percent) and Python (22 percent). Also, 82 percent of respondents using Spark said they chose Spark to replace MapReduce.

"When we started Spark, we had two goals—we wanted to work with the Hadoop ecosystem, which is JVM-based, and we wanted a concise programming interface similar to Microsoft’s DryadLINQ (the first language-integrated Big Data framework I know of, that begat things like FlumeJava and Crunch)," Zaharia said in the study. "On the JVM, the only language that would offer that kind of API was Scala, due to its ability to capture functions and ship them across the network. Scala's static typing also made it much easier to control performance compared to, say, Jython or Groovy."

The Typesafe study acknowledges that Spark is less mature than older technologies, like MapReduce, so developers also need good documentation, example applications, and guidance on runtime performance tuning, management and monitoring. Spark is also driving interest in Scala, the language in which Spark is written, but developers and data scientists can also use Java, Python, and soon, R, the study said.

"This survey further validates Databricks' partnership and shared vision with Typesafe to bring a comprehensive suite of application development tools for developers that enable enterprises to operate with more agility and speed," said Kavitha Mariappan, vice president of marketing at Databricks, in a statement. "We look forward to collectively utilizing this feedback to make the Spark developer experience not only richer but also as seamless as possible."


Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel