The Next Challenge for Hadoop: Quality of Service

1 of 8

The Next Challenge for Hadoop: Quality of Service

For Hadoop to move forward into the next decade, the community must address one key but often overlooked thing: quality of service.

2 of 8

What Is QoS for Hadoop?

Quality of service for Hadoop is the best first step toward measuring Hadoop performance. QoS provides the ability to ensure performance service levels for applications running on Hadoop by enabling the prioritization of critical jobs and addressing problems like resource contention, missed deadlines and sluggish cluster performance. By avoiding bottlenecks and contention, multiple jobs can run side-by-side, effectively and without interference.

3 of 8

Why QoS for Hadoop?

Many companies run into roadblocks when they try to guarantee performance because priority jobs aren't completed on time and clusters are underutilized. Resource contention is inevitable with today's multi-tenant, multi-workload clusters, especially as big data applications scale. Why is this a problem? On the business side, companies waste time and money trying to fix cluster performance issues that prevent them from gaining competitive advantages linked to big data initiatives or realizing the full ROI of their big data efforts. From a technological perspective, unreliable Hadoop performance means late jobs, missed service-level agreements, overbuilt clusters and under-utilized hardware.

4 of 8

Hadoop, We Have a Problem

As organizations get more advanced in their Hadoop use and run business-critical applications in multi-tenant clusters, they can no longer afford to lose sight of what's happening from behind an increasingly insurmountable class of performance challenges—especially, if they want to make the most out of their distributed computing investments. Complicated frameworks like YARN already place performance pressure on systems, and if you look into the future at new compute platforms like Mesos, OpenStack and Docker, they will all run into this same set of widely applicable problems eventually. It's vital that organizations get ahead of these issues now.

5 of 8

Getting Around Workarounds

Once a Hadoop cluster hits a performance wall, admins need to find a resolution but are discovering that traditional best practices and manual tuning workarounds just don't work. Over-provisioning, silo-ing and tuning aren't solutions that last long term; plus, they are very expensive and create needless overhead. Purchasing additional nodes when hardware utilization is well below 100 percent is a costly, temporary fix that only addresses performance symptoms, not the fundamental limitations of Hadoop. Similarly, cluster isolation is costly, doubles complexity and simply isn't a viable solution at scale. Finally, tuning by definition is a response to problems that have already occurred, and it's impossible for a human to make the thousands of decisions necessary to tune settings in real time to adjust to constantly changing cluster conditions.

6 of 8

Going Real Time

The most effective solution for resource contention is to monitor hardware resources in real time. Monitoring the hardware resources of each node in the cluster second-by-second allows you to understand which job has control over resources and to know the priority levels of each job across the cluster. This ensures that all jobs get access to cluster hardware resources in an equitable manner and business-critical jobs can finish on time, thereby guaranteeing QoS for Hadoop.

7 of 8

QoS for Hadoop in Production

Companies like Trulia, Chartboost and Upsight are implementing systems that guarantee QoS for Hadoop and reaping the benefits. Trulia has successfully disrupted a decades-old industry by using and analyzing real-time data to deliver customized insights straight to consumers. With many teams writing Hadoop jobs or using Hive or Spark, Trulia has to ensure reliability in its multi-tenant, multi-workload environment. In response to delayed or unpredictable jobs that affected their customer push-notification programs, Trulia would intentionally underutilize its clusters to ensure jobs were completed on time and prevent traffic from being negatively affected. Now, Trulia uses Pepperdata to actively monitor and control all their Hadoop clusters.

8 of 8

Why Big Data Analytics in the Cloud's Time Has Come

While CRM software and applications like payroll and expense reporting have moved steadily toward the cloud, business intelligence (BI) and big data analytics have been slower to follow the lead. But as the cloud becomes more mainstream, all signs point to "go" for analytics to get out of the shadows and step into the cloud. Research firm Forrester predicts that by mid-2016 nearly three-quarters of companies will use cloud-based BI. While on-premise analytics deployments will continue for the foreseeable future, the tides are changing when it comes to organizational comfort with moving business-critical functions like BI and analytics to the cloud. Whether the decision to move to the cloud is instigated by economics or the ever-increasing speed of business, organizations need to become data-driven faster, and turning to the cloud sooner rather than later will help get them there. eWEEK recently...