In the end, we needed better performance as well as reliability. Pillar Data gave us the reliability, but I was still very unsure of the performance. As part of this SAN upgrade, my boss pushed me to move to blade servers and virtualized servers. We bought into the IBM Blade server with five blades, about half the cost of the Hewlett-Packard Blade Center. I was able to test VMware and XenSource products. Being a Novell man, I was a bit disheartened to find that Xen just wasn't where VMware was, so even though the extra cost hurt, we went with VMware as our virtual server platform on the blade servers.
I installed VMware ESX server on all the blades, tied them together with Virtual Center and proceeded to set up the SAN with the Pillar Axiom system. I installed about two servers per blade before our e-mail server was in its death throes, and I made the decision to move the e-mail server to a virtual machine.
I was very concerned about the SAN performance but also the server performance, even though each blade had dual quad-core processors and 8GB of RAM. VMware only allows you to assign one processor to a NetWare machine. I did a backup of the e-mail server, which took about eight hours, after which the disk shut down and the server took a poison pill (that was close).
It took one hour to restore the files. So at 3 a.m. our e-mail server was moved from a server with two dual-core Xeon processors with 4GB of RAM and a dedicated Qlogic 4GBs FC channel card to a virtual server with a single processor, 2GB of RAM and shared Qlogic 4GBs card, and it ran like a bat out of hell! To give you an idea, sometimes switching to a user's in-box could take up to one minute; now it takes less than one second. Processor utilization on the server maxes out at about 800MHz, during very rare spikes, and hovers around 300MHz when we are fully loaded with 750 active users.
What this boils down to is a very responsive SAN network. With the old servers the processor utilization was a lot higher and file queues were getting backed up. I suspect GroupWise was either having a hard time writing to disk, or it was trying to verify data after being written and that overworked the server. In a sense GroupWise is a nice test for disk usage; it uses a ton of small files and it accesses them like crazy.
We still have one functioning Clariion SAN Server, the Clariion 150 series. That model seems to work better and shows a good comparison between the functionality of the two systems. With the move to the new system, we gained important capabilities that the older system lacked:
Quality of Service: Even though there are only four settings on the Pillar system-archive, low, medium and high-this makes a world of difference allocating storage between low- usage servers and high-usage servers.
Logical Unit Number Mapping: This is more powerful than you might imagine. Moving virtual machines between physical servers makes this ability very powerful.
Management via multiple methods: Very nice. If this was in the Clariion System we may have found the problem with the SAN well before we experienced the near disaster of losing all our data.
Data Protection: On top of the RAID 5 and Hot Spare there are further options for double and triple redundancy.
In the end we may have gotten lucky, or maybe we were helped by lessons learned.
Going with the absolute lowest price may not be the best way or the cheapest. I would suggest that anyone looking at SAN products figure out what you want, then proceed to find out what it will take to get that. It may well be that it can be managed within your budget.
About the author: Brett Littrell is network manager for the Milpitas (Calif.) Unified School District, which manages about 10,000 students and 1,000 employees. Its computer network has around 2,000 to 2,500 client computers with three technicians to maintain them all. He can be reached at firstname.lastname@example.org