Organizations today are rapidly virtualizing their infrastructures. In doing so, they are experiencing a whole new set of systems management challenges. These challenges cannot be solved with traditional toolsets in an acceptable timeframe to match the velocity at which organizations are virtualizing. In a virtual server infrastructure where all resources are shared, optimal performance can only be achieved with proactive capacity management and proper allocation of shared resources.
The biggest challenge is finding the vast amount of time or automated technology to do this. Not allocating enough resources can cause bottlenecks in CPU, memory, storage and disk I/O, which can lead to performance problems and costly downtime events. However, over-allocating resources can drive up your cost per virtual machine, making a ROI harder to achieve and halting future projects.
To address this, organizations should consider a life cycle approach to performance assurance in order to proactively prevent performance issues-starting in preproduction and continually monitoring the production environments. By modeling, validating, monitoring, analyzing and charging, the Performance Assurance Lifecycle (PAL) addresses resource allocation and management. It significantly reduces performance problems, ensures optimal performance of the virtual infrastructure and helps organizations to continually meet service-level agreements (SLAs).
The following are the five components of the PAL. These components allow organizations to maximize the performance and utilization of their virtual infrastructures, while streamlining costs and delivering a faster ROI.
Component No. 1: Modeling
Modeling addresses preproduction planning to post-production additions, as well as changes to the virtual infrastructure. With capabilities to quickly model thousands of “what if” scenarios-from adding more virtual machines to changing configuration settings-IT staff can immediately see whether or not resource constraints will be exceeded and if performance issues will occur. In this way, modeling provides proactive prevention.
Four common modeling scenarios are:
1. See the effect on resource capacity and utilization of adding a new host/virtual machine or removing existing ones.
2. What will happen when a host is suspended for maintenance or a virtual machine is powered down?
3. Pre-testing VMotion scenarios to make sure sufficient resources exist.
4. How will performance be affected if resource changes are made to hosts, clusters and/or resource pools?
Component No. 2: Validating
While modeling “what if” scenarios is an important first step to continually ensuring optimal performance, it is equally important to validate that changes will not have a negative impact on infrastructure performance.
Validation spans between the modeling stage and the monitoring stage of the PAL, because it is equally critical to initially validate performance-impacting changes in preproduction, as well as to continually monitor and validate performance over time. If you cannot validate that a certain change will impact infrastructure performance in either a negative or positive way, there is significant risk to making that change.
Component No. 3: Monitoring
The ongoing monitoring of shared resource utilization and capacity is absolutely essential to knowing how the virtual environment will perform. When monitoring resource utilization, IT staff will know whether resources are being over or underutilized. Not allocating enough resources (based on usage patterns and trends derived from 24/7 monitoring) will cause performance bottlenecks, leading to costly downtime and SLA violations. Over-allocating resources can drive up the cost per virtual machine, making a ROI much harder to achieve.
By continually monitoring shared resource utilization and capacity in virtual server environments, IT can significantly reduce the time and cost of identifying current capacity bottlenecks that are causing performance problems, tracking the top resource consumers in your environment, alerting you when capacity utilization trends exceed thresholds, and optimizing performance to meet established SLAs.
Component No. 4: Analyzing
Proactive approaches based on trend and predictive analysis of the data being monitored can significantly reduce fear by providing ample warning (for example, alerting system administrators to potential problems as new conditions materialize). By knowing ahead of time what resource constraints may occur, IT can take the appropriate proactive measures to prevent the problems from happening-providing the necessary confidence to virtualize their critical applications.
There are two layers of analysis that can help deliver the information IT staffs need to instill confidence that their infrastructures will perform. The two layers are trend analysis and predictive analysis.
While real-time monitoring tools can show spikes in resources consumption, those spikes may not have a drastic impact on performance or may only be one-time events. Trend analysis based on 24/7 monitoring of resource utilization provides visibility into how the virtual server environment is performing over time. Is resource utilization trending higher, lower or staying the same? Is it necessary to add more capacity or is there room to safely add more virtual machines?
By leveraging trend analysis and running the data through sophisticated mathematical engines, future problems can be predicted. This allows IT to take preventive and proactive actions now. If you knew in a certain amount of days that a problem may occur, you could prevent 90 percent of these performance problems from ever happening. Threshold alerts could be set to show that, in 30 days, a cluster will begin to run out of storage. By knowing that issue today-as opposed to when it happens-actions can be taken now to proactively increase storage allocations and prevent the future problem.
Component No. 5: Charging
Having the data on what resource consumption actually costs departments enables IT to gain cost visibility and properly allocate the resources required to service each business unit. This is important in further planning the virtual infrastructure, as IT can make more informed decisions on additional purchases and upgrades to help optimize the infrastructure. IT also now has the fiscal information to present to corporate finance to justify these decisions and move virtualization projects forward.
The ultimate goal is to lower and/or control costs with intelligent planning to drive a quicker time to ROI. Success of a virtualization project will be based on that time to ROI, and a balance between cost and performance. With a chargeback process in place, departments will learn that there is a cost associated with adding virtual machines.
This should also help address the virtual machine sprawl “epidemic” that is spreading like wildfire, causing resource constraints and impacting performance. While virtual machines aren’t free, inter-organizational departments will see significant cost and time savings benefits from virtualized environments over deploying more physical servers.
The PAL is designed to eliminate performance and financial risks to provide a level of confidence necessary to take the data center from 15 to 20 percent virtualized to 70 to 100 percent virtualized. Without critical data that can prove that performance levels are consistently met, organizations will not feel comfortable virtualizing servers that are critical to their business success, or making additional costly investments in virtual infrastructure (servers, storage, etc.).
By following the five steps of the PAL, IT staffs can gain 20 to 30 percent performance and utilization efficiencies, and will have the confidence to make every virtualization project successful based on performance and cost criteria.
Alex Bakman is founder and CEO of VKernel. Alex is a recognized expert in computer security, virtualization and systems management. He holds many United States and international patents, and is a frequent speaker at industry events. A serial entrepreneur and visionary, Alex founded Ecora Software (acquired by Versata Enterprises) and CleverSoft, a software company acquired by Candle Corporation. Alex can be reached at [email protected].