Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications

    When to Select Apache Spark, Hadoop or Hive for Your Big Data Project

    By
    Chris Preimesberger
    -
    September 16, 2015
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      PrevNext

      1When to Select Apache Spark, Hadoop or Hive for Your Big Data Project

      1 - When to Select Apache Spark, Hadoop or Hive for Your Big Data Project

      Apache Spark is making remarkable gains at the expense of the original Hadoop ecosystem. Here’s a guide to help decide between Spark and other Hadoop engines.

      2Sparking the Hadoop Industry

      2 - Sparking the Hadoop Industry

      Spark has been gaining major traction in the Hadoop community, with IBM recently announcing huge investments in the technology and with numerous big data vendors integrating Spark into their big data offerings. A recent survey conducted at big data-as-a-service provider Qubole, according to CEO Ashish Thusoo, reveals that 42 percent of 288 respondents either are using or planning on integrating Spark into their infrastructure strategy in the next two years.

      3Hadoop vs. Spark

      3 - Hadoop vs. Spark

      There’s been an ongoing debate about whether Spark will replace Hadoop altogether because of its advantages around machine learning and its ability to process workloads up to 100 times faster. Spark also addresses certain shortcomings of Hadoop, such as the batch-oriented and disk-intensive limitations. However, Qubole is tool-agnostic and does not believe that Spark will be dominating the big data ecosystem. There are still distinct trade-offs in terms of use cases, infrastructure cost and relative maturity.

      4Streaming Data

      4 - Streaming Data

      Apache Spark’s key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Apache Spark has the capability to handle this extra workload. Some experts even theorize that Spark could become the go-to platform for stream-computing applications, no matter the type. By supporting streaming analytics of multiple kinds, Apache Spark shows its versatility, making it a clear choice in most use cases. That versatility extends to other Spark streaming capabilities, such as fraud detection and log processing.

      5Machine Learning

      5 - Machine Learning

      Another of the many Apache Spark use cases is machine learning. Spark helps users run repeated queries on sets of data, which essentially amounts to processing machine learning algorithms. Spark’s machine learning library can work in areas such as clustering, classification and dimensionality reduction, among many others. All this enables Spark to be used for some common big data functions, such as predictive intelligence, customer segmentation for marketing purposes and sentiment analysis.

      6Interactive Analysis

      6 - Interactive Analysis

      MapReduce was built to handle batch processing. SQL-on-Hadoop engines, such as Hive or Pig, are frequently too slow for interactive analysis. Apache Spark, however, is fast enough to perform exploratory queries without sampling. Spark also interfaces with a number of development languages including SQL, R and Python.

      7Fog Computing

      7 - Fog Computing

      Connected objects in the Internet of things collect massive amounts of data, process it, and deliver revolutionary new features and applications for people to use in their everyday lives. All that processing, however, is tough to manage with the current analytics capabilities in the cloud. That’s where fog computing and Apache Spark come in. Fog computing decentralizes the data processing and storage, instead performing those functions on the edge of the network. Analyzing and processing this type of data can best be carried out by Apache Spark with its streaming analytics engine and interactive real-time query tool.

      8When Not to Use Spark

      8 - When Not to Use Spark

      Even though it’s versatile, that doesn’t necessarily mean Apache Spark’s in-memory capabilities are the best fit for all use cases. For example, Spark was not designed as a multiuser environment. Spark users are required to know whether the memory they have access to is sufficient for a dataset. Adding more users further complicates this, since the users will have to coordinate memory usage to run projects concurrently. Due to this, users will want to consider an alternate engine, such as Apache Hive, for large batch projects.

      9The Future of Spark

      9 - The Future of Spark

      Over time, Apache Spark will continue to develop its own ecosystem, becoming even more versatile. In a world where big data has become the norm, organizations will need to find the best way to utilize it. Judging from these Apache Spark use cases, there should be many opportunities in the coming years to see how powerful Spark truly is.

      PrevNext

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×