When to Select Apache Spark, Hadoop or Hive for Your Big Data Project

 
 
By Chris Preimesberger  |  Posted 2015-09-16
 
 
 
 
 
 
 
 
 
  • Previous
    1 - When to Select Apache Spark, Hadoop or Hive for Your Big Data Project
    Next

    When to Select Apache Spark, Hadoop or Hive for Your Big Data Project

    Apache Spark is making remarkable gains at the expense of the original Hadoop ecosystem. Here's a guide to help decide between Spark and other Hadoop engines.
  • Previous
    2 - Sparking the Hadoop Industry
    Next

    Sparking the Hadoop Industry

    Spark has been gaining major traction in the Hadoop community, with IBM recently announcing huge investments in the technology and with numerous big data vendors integrating Spark into their big data offerings. A recent survey conducted at big data-as-a-service provider Qubole, according to CEO Ashish Thusoo, reveals that 42 percent of 288 respondents either are using or planning on integrating Spark into their infrastructure strategy in the next two years.
  • Previous
    3 - Hadoop vs. Spark
    Next

    Hadoop vs. Spark

    There's been an ongoing debate about whether Spark will replace Hadoop altogether because of its advantages around machine learning and its ability to process workloads up to 100 times faster. Spark also addresses certain shortcomings of Hadoop, such as the batch-oriented and disk-intensive limitations. However, Qubole is tool-agnostic and does not believe that Spark will be dominating the big data ecosystem. There are still distinct trade-offs in terms of use cases, infrastructure cost and relative maturity.
  • Previous
    4 - Streaming Data
    Next

    Streaming Data

    Apache Spark's key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Apache Spark has the capability to handle this extra workload. Some experts even theorize that Spark could become the go-to platform for stream-computing applications, no matter the type. By supporting streaming analytics of multiple kinds, Apache Spark shows its versatility, making it a clear choice in most use cases. That versatility extends to other Spark streaming capabilities, such as fraud detection and log processing.
  • Previous
    5 - Machine Learning
    Next

    Machine Learning

    Another of the many Apache Spark use cases is machine learning. Spark helps users run repeated queries on sets of data, which essentially amounts to processing machine learning algorithms. Spark's machine learning library can work in areas such as clustering, classification and dimensionality reduction, among many others. All this enables Spark to be used for some common big data functions, such as predictive intelligence, customer segmentation for marketing purposes and sentiment analysis.
  • Previous
    6 - Interactive Analysis
    Next

    Interactive Analysis

    MapReduce was built to handle batch processing. SQL-on-Hadoop engines, such as Hive or Pig, are frequently too slow for interactive analysis. Apache Spark, however, is fast enough to perform exploratory queries without sampling. Spark also interfaces with a number of development languages including SQL, R and Python.
  • Previous
    7 - Fog Computing
    Next

    Fog Computing

    Connected objects in the Internet of things collect massive amounts of data, process it, and deliver revolutionary new features and applications for people to use in their everyday lives. All that processing, however, is tough to manage with the current analytics capabilities in the cloud. That's where fog computing and Apache Spark come in. Fog computing decentralizes the data processing and storage, instead performing those functions on the edge of the network. Analyzing and processing this type of data can best be carried out by Apache Spark with its streaming analytics engine and interactive real-time query tool.
  • Previous
    8 - When Not to Use Spark
    Next

    When Not to Use Spark

    Even though it's versatile, that doesn't necessarily mean Apache Spark's in-memory capabilities are the best fit for all use cases. For example, Spark was not designed as a multiuser environment. Spark users are required to know whether the memory they have access to is sufficient for a dataset. Adding more users further complicates this, since the users will have to coordinate memory usage to run projects concurrently. Due to this, users will want to consider an alternate engine, such as Apache Hive, for large batch projects.
  • Previous
    9 - The Future of Spark
    Next

    The Future of Spark

    Over time, Apache Spark will continue to develop its own ecosystem, becoming even more versatile. In a world where big data has become the norm, organizations will need to find the best way to utilize it. Judging from these Apache Spark use cases, there should be many opportunities in the coming years to see how powerful Spark truly is.
 

It's taken a few years to get some real traction, but Apache Spark is making remarkable gains at the expense of the original Hadoop ecosystem—due largely to its ability to process large volumes of data much faster than the older platform. Apache Spark is an open-source data processing engine built for speed, ease of use and sophisticated analytics. It is designed to process both batch processing and new workloads, such as analytics streaming, interactive queries and machine learning. IBM, to name only one major player, has invested heavily in Spark, which can give enterprises an advantage in multi-pass iterative machine learning algorithms, as well as interactive data interrogation on in-memory data sets. With a variety of Hadoop engine options from which to choose, it's important for CTOs to consider which engines are ideal for specific projects and use cases. This eWEEK slide show provides an overview of some specific examples of Spark use cases, as well as some advice for when not to use Spark.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel