Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database
    • Development
    • Storage

    Why Hadoop Analytics Projects Often Fall Short of Their Goals

    By
    Chris Preimesberger
    -
    August 8, 2014
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      PrevNext

      1Why Hadoop Analytics Projects Often Fall Short of Their Goals

      1 - Why Hadoop Analytics Projects Often Fall Short of Their Goals

      by Chris Preimesberger

      2Jobs Crash or Fail to Be Completed on Time

      2 - Jobs Crash or Fail to Be Completed on Time

      Hadoop deployments often start as “sandbox” environments. Over time, the workload expands, and an increasing number of jobs on the cluster support production applications that are governed by service-level agreements. Ad hoc queries and other low-priority jobs compete with business-critical applications for system resources, causing high-priority jobs to fail to complete when needed.

      3No Ability to Monitor Cluster Performance in Real Time

      3 - No Ability to Monitor Cluster Performance in Real Time

      Hadoop diagnostic tools are static, providing log file information about what happened to jobs on the cluster after they have run. Hadoop does not provide the ability to monitor at a sufficiently granular level what’s happening while multiple jobs are running. As a result, it is difficult and often impossible to take corrective action to prevent operational problems before they occur.

      4Lack of Macro-Level Visibility, Control Over the Cluster

      4 - Lack of Macro-Level Visibility, Control Over the Cluster

      Various Hadoop diagnostic tools provide the ability to analyze individual job statistics and examine activity on individual nodes on the cluster. In addition, developers can tune their code to ensure optimal performance for individual jobs. What’s lacking, however, is the ability to monitor, analyze and control what’s happening with all users, jobs and tasks running on the entire cluster, including use of each hardware resource.

      5Insufficient Ability to Set and Enforce Job Priorities

      5 - Insufficient Ability to Set and Enforce Job Priorities

      While job schedulers and resource managers provide basic capabilities such as job sequencing, time- and event-based scheduling and node allocation, they are insufficient in ensuring that cluster resources are being used in the most efficient manner while jobs are running.

      6Underutilized, Wasted Capacity

      6 - Underutilized, Wasted Capacity

      Organizations typically size their clusters for maximum peak workloads. The extra capacity that is rarely used can be very expensive and often unnecessary.

      7Insufficient Ability to Control Allocation of Resources Across a Cluster in Real Time

      7 - Insufficient Ability to Control Allocation of Resources Across a Cluster in Real Time

      When rogue jobs, inefficient or expensive queries, or other processes running on the cluster adversely impact performance, it is often too late for Hadoop operators to take the necessary corrective actions before service-level agreements are missed.

      8Lack of Granular View Into How Cluster Resources Are Used

      8 - Lack of Granular View Into How Cluster Resources Are Used

      When jobs crash or fail to complete on time, Hadoop operators/administrators have difficulty diagnosing performance problems. Hadoop does not provide a way of monitoring and analyzing cluster performance with sufficient context and detail. For example, it is impossible to isolate problems by user, job, or task and pinpoint bottlenecks related to network, memory or disk.

      9Inability to Predict When a Cluster Will Max Out

      9 - Inability to Predict When a Cluster Will Max Out

      More jobs, different kinds of jobs, expanding data volumes, different data types, more complex queries and many other variables continuously increase the load on cluster resources over time. Often, the need for additional cluster resources isn’t apparent until a disaster occurs (for example, a customer-facing Website goes down or a mission-critical report doesn’t run). Consequences can include unsatisfactory customer experiences, missed business opportunities, unplanned capital expense requests and more.

      10HBase and MapReduce Contention

      10 - HBase and MapReduce Contention

      Contention for system resources by HBase and MapReduce jobs can affect overall performance significantly. The inability to optimize resource utilization while these different types of workloads run concurrently leads many organizations to suffer the expense of deploying separate, dedicated clusters.

      11Lack of Key Visual Dashboards

      11 - Lack of Key Visual Dashboards

      The ones at issue enable interactive exploration and fast diagnoses of performance-related issues on the cluster. The static reports and detailed log files provided with Hadoop schedulers and resource managers are not conducive to fast or easy problem diagnosing. Culling through voluminous data while troubleshooting can waste hours or even days. Hadoop operators need the ability to quickly visualize, analyze and understand the causes of performance problems and identify opportunities to optimize resource use.

      PrevNext

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×