Intel Nehalem's Hidden Virtualization Risk

With advances in addressable memory, simultaneous multithreading and I/O handling, Intel's Nehalem microarchitecture allows IT managers to put even more VMs on a single physical system. But there is risk involved with putting too many virtual eggs in one physical basket, and data center managers must look to careful capacity planning, as well as monitoring and reporting tools, to prevent problems.

Intel's Nehalem microarchitecture-now officially known as the Intel Xeon 5500 series-offers advances in addressable memory, simultaneous multithreading and I/O handling, providing IT managers with the ability to put even more virtual machines on a single physical system. But as memory capacity and compute power increase and I/O overhead diminishes, what's the new VM limit on a single system? The answer may be the amount of risk your organization can tolerate in terms of putting a large number of virtual eggs in one physical basket.

To reduce risk, IT managers should place a premium on moving VMs to healthy physical systems and on data center management tools that monitor both the virtual and physical resources in the data center.

Several years ago, Intel introduced processor technology called FlexMigration that eases virtual-to-virtual machine movement across Intel processors of different generations. Combined with VMware's Extended VMotion-which overcomes limitations in the older, strict VMotion that required exactly matched physical processors-IT managers will have more options when it comes to migrating VMs. The trade-off is that VMs will be presented with a processor of the lowest-common-denominator feature set in the VMotion pool of physical systems.

IT managers who build on virtualization platforms that provide this kind of migration flexibility can reduce risk by ensuring that their VMs have someplace to run in case of unplanned downtime. Managers who want to demonstrate even greater foresight will take this a step further by ensuring that all the VM bolt holes have sufficient memory, bandwidth, compute and storage resources to support the business applications running on the VMs.

Reducing the risk of VM downtime also requires careful capacity planning to keep costs in line. This is where data center management tools come into play.

IT managers must implement tools that take physical and virtual machines into account when tracking CPU, memory, storage and network bandwidth. An accurate reporting of these resources is even more important for regulated enterprises or for organizations that implement tight security policies, where machine separation and stringent access management is a requirement.

To reduce VM downtime risk while controlling costs, I recommend that IT managers immediately implement data center management tools that provide some kind of trend reporting. Even if these trends provide only historic data with no prediction, this can be very useful information for understanding what is a normal workload for your data center. This provides a good basis for making equipment purchases and hypervisor and operating system licensing recommendations to business managers.