eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.
2Jobs Crash or Fail to Be Completed on Time
Hadoop deployments often start as “sandbox” environments. Over time, the workload expands, and an increasing number of jobs on the cluster support production applications that are governed by service-level agreements. Ad hoc queries and other low-priority jobs compete with business-critical applications for system resources, causing high-priority jobs to fail to complete when needed.
3No Ability to Monitor Cluster Performance in Real Time
Hadoop diagnostic tools are static, providing log file information about what happened to jobs on the cluster after they have run. Hadoop does not provide the ability to monitor at a sufficiently granular level what’s happening while multiple jobs are running. As a result, it is difficult and often impossible to take corrective action to prevent operational problems before they occur.
4Lack of Macro-Level Visibility, Control Over the Cluster
Various Hadoop diagnostic tools provide the ability to analyze individual job statistics and examine activity on individual nodes on the cluster. In addition, developers can tune their code to ensure optimal performance for individual jobs. What’s lacking, however, is the ability to monitor, analyze and control what’s happening with all users, jobs and tasks running on the entire cluster, including use of each hardware resource.
5Insufficient Ability to Set and Enforce Job Priorities
6Underutilized, Wasted Capacity
7Insufficient Ability to Control Allocation of Resources Across a Cluster in Real Time
8Lack of Granular View Into How Cluster Resources Are Used
When jobs crash or fail to complete on time, Hadoop operators/administrators have difficulty diagnosing performance problems. Hadoop does not provide a way of monitoring and analyzing cluster performance with sufficient context and detail. For example, it is impossible to isolate problems by user, job, or task and pinpoint bottlenecks related to network, memory or disk.
9Inability to Predict When a Cluster Will Max Out
More jobs, different kinds of jobs, expanding data volumes, different data types, more complex queries and many other variables continuously increase the load on cluster resources over time. Often, the need for additional cluster resources isn’t apparent until a disaster occurs (for example, a customer-facing Website goes down or a mission-critical report doesn’t run). Consequences can include unsatisfactory customer experiences, missed business opportunities, unplanned capital expense requests and more.
10HBase and MapReduce Contention
Contention for system resources by HBase and MapReduce jobs can affect overall performance significantly. The inability to optimize resource utilization while these different types of workloads run concurrently leads many organizations to suffer the expense of deploying separate, dedicated clusters.
11Lack of Key Visual Dashboards
The ones at issue enable interactive exploration and fast diagnoses of performance-related issues on the cluster. The static reports and detailed log files provided with Hadoop schedulers and resource managers are not conducive to fast or easy problem diagnosing. Culling through voluminous data while troubleshooting can waste hours or even days. Hadoop operators need the ability to quickly visualize, analyze and understand the causes of performance problems and identify opportunities to optimize resource use.