The Next Challenge for Hadoop: Quality of Service

 
 
By Darryl K. Taft  |  Posted 2016-04-22
 
 
 
 
 
 
 
 
 
  • Previous
    1 - The Next Challenge for Hadoop: Quality of Service
    Next

    The Next Challenge for Hadoop: Quality of Service

    For Hadoop to move forward into the next decade, the community must address one key but often overlooked thing: quality of service.
  • Previous
    2 - What Is QoS for Hadoop?
    Next

    What Is QoS for Hadoop?

    Quality of service for Hadoop is the best first step toward measuring Hadoop performance. QoS provides the ability to ensure performance service levels for applications running on Hadoop by enabling the prioritization of critical jobs and addressing problems like resource contention, missed deadlines and sluggish cluster performance. By avoiding bottlenecks and contention, multiple jobs can run side-by-side, effectively and without interference.
  • Previous
    3 - Why QoS for Hadoop?
    Next

    Why QoS for Hadoop?

    Many companies run into roadblocks when they try to guarantee performance because priority jobs aren't completed on time and clusters are underutilized. Resource contention is inevitable with today's multi-tenant, multi-workload clusters, especially as big data applications scale. Why is this a problem? On the business side, companies waste time and money trying to fix cluster performance issues that prevent them from gaining competitive advantages linked to big data initiatives or realizing the full ROI of their big data efforts. From a technological perspective, unreliable Hadoop performance means late jobs, missed service-level agreements, overbuilt clusters and under-utilized hardware.
  • Previous
    4 - Hadoop, We Have a Problem
    Next

    Hadoop, We Have a Problem

    As organizations get more advanced in their Hadoop use and run business-critical applications in multi-tenant clusters, they can no longer afford to lose sight of what's happening from behind an increasingly insurmountable class of performance challenges—especially, if they want to make the most out of their distributed computing investments. Complicated frameworks like YARN already place performance pressure on systems, and if you look into the future at new compute platforms like Mesos, OpenStack and Docker, they will all run into this same set of widely applicable problems eventually. It's vital that organizations get ahead of these issues now.
  • Previous
    5 - Getting Around Workarounds
    Next

    Getting Around Workarounds

    Once a Hadoop cluster hits a performance wall, admins need to find a resolution but are discovering that traditional best practices and manual tuning workarounds just don't work. Over-provisioning, silo-ing and tuning aren't solutions that last long term; plus, they are very expensive and create needless overhead. Purchasing additional nodes when hardware utilization is well below 100 percent is a costly, temporary fix that only addresses performance symptoms, not the fundamental limitations of Hadoop. Similarly, cluster isolation is costly, doubles complexity and simply isn't a viable solution at scale. Finally, tuning by definition is a response to problems that have already occurred, and it's impossible for a human to make the thousands of decisions necessary to tune settings in real time to adjust to constantly changing cluster conditions.
  • Previous
    6 - Going Real Time
    Next

    Going Real Time

    The most effective solution for resource contention is to monitor hardware resources in real time. Monitoring the hardware resources of each node in the cluster second-by-second allows you to understand which job has control over resources and to know the priority levels of each job across the cluster. This ensures that all jobs get access to cluster hardware resources in an equitable manner and business-critical jobs can finish on time, thereby guaranteeing QoS for Hadoop.
  • Previous
    7 - QoS for Hadoop in Production
    Next

    QoS for Hadoop in Production

    Companies like Trulia, Chartboost and Upsight are implementing systems that guarantee QoS for Hadoop and reaping the benefits. Trulia has successfully disrupted a decades-old industry by using and analyzing real-time data to deliver customized insights straight to consumers. With many teams writing Hadoop jobs or using Hive or Spark, Trulia has to ensure reliability in its multi-tenant, multi-workload environment. In response to delayed or unpredictable jobs that affected their customer push-notification programs, Trulia would intentionally underutilize its clusters to ensure jobs were completed on time and prevent traffic from being negatively affected. Now, Trulia uses Pepperdata to actively monitor and control all their Hadoop clusters.
 

After a decade of proving that Hadoop is not just hype, much of the focus and attention of this open-source community now goes to its evolving ecosystem of tools and applications—Spark, Impala, Hive—that are helping usher in new users exploring new use cases. However, as additional workloads are added to a cluster, the challenges of using Hadoop in production grow exponentially and become increasingly complicated. How will clusters react to massive growth and unpredictable changes in usage? Your cluster may be operating just fine right now, but what happens in a few months as you add hundreds of new workloads? When a cluster has hundreds of nodes each running dozens of jobs, Hadoop quickly becomes a chaotic system, and when uses are constantly and dynamically changing, it has an impact on business-critical performance. Ultimately, Hadoop won't move forward into the next decade unless the community addresses one massively important and consistently overlooked thing: quality of service (QoS). This eWEEK slide show explores issues around QoS for Hadoop.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel