With a handful of customers still offline on the afternoon of Aug. 3 after service outages caused by a data center migration snafu, executives at Hostway are not yet ready for a Monday morning quarterbacking session.
But industry pundits had plenty of opinions on how to avoid the type of lengthy outages some of Hostways newly acquired ValueWeb customers experienced.
“Moving a data center is a major project, but its one thats been done before, and there are lots of resources for that. You get the whole thing up and going offline, test it, make sure its working, and then you swing it online. You only take the old stuff offline after you know its working,” said Roger Kay, founder and president of Endpoint Technology Associates. “It literally sounds like they took their servers, put them on a truck, unpacked them and tried to plug them in again,” he said.
To be fair, Hostway has performed several successful migrations of servers from one location to another, according to John Enright, vice president of marketing and business development for Hostway, in Fort Lauderdale, Fla.
Enright said the data center migration was intended to improve service quality for the roughly 3,000 customers who were affected by the move after Hostway merged with Affinity Internet to become one of the largest Web hosting companies.
The migration from Affinitys ValueWeb Miami data center to Hostways data center in Tampa, Fla., was meant to “improve quality of service by moving into a facility with more capacity and better connectivity,” Enright said.
Although Hostway is working “feverishly” to bring the last few customers back online, the root of the problem that caused some 400 customers to be offline for days after the projected 12- to 15-hour outage has not yet been determined. However, the company saw an “unusually large number of hardware failures that occurred during the transportation,” Enright said.
“Any time there is a relocation of equipment, there is an increased risk of hardware failure. We had additional parts in our data center to account for that, but the number of hardware failures exceeded our most pessimistic forecast,” he said.
Still, not all customers who experienced lengthy outages were plagued by hardware problems.
“They ignored all the support tickets till they fixed 100 servers with hardware [problems],” claimed former customer Steve Thompson, founder of Personalized Websites, in Columbia, Mo. “They had all these customers that just needed simple things and they didnt use their resources. Instead they were all just [working] on one problem. Meanwhile the only communication we had was the stupid message on voice mail,” said Thompson, expressing frustration.
In hindsight, Enright agreed that communication during the outage could have been better. “Communication was definitely our biggest challenge in the [initial] period after the migration. We could have done better in carving out time to communicate with customers. Our folks in the Tampa data center were focused 100 percent on getting customers up,” he said.
Next Page: Planning for server migration.
Planning for Server Migration
The migration, which began on the evening of July 27 and was supposed to be completed within 12 to 15 hours, involved moving 3,700 servers for 3,000 customers across the state of Florida.
But it appears that Hostway did not adequately plan for such a move, said Charles King, principal analyst at consulting firm Pund-IT.
“When you are talking about moving 3,700 servers, that is not a small data center. The sheer logistics of powering down, disconnecting, packing, loading, moving, repositioning and reconnecting that many servers is a formidable task. The notion that they could do that in half a day Im not sure is physically possible,” he said.
Endpoints Kay said he believes that several key steps were left out of the process for the migration from Miami to Tampa.
“In a migration there are a couple of phases: One has to do with programs, another with data. You start with the programs, and then carefully migrate data to a separate area, then bring it to a new site, then test it and make sure it works, then take the old site offline and bring the new site online with small gap, then take the old site down,” Kay said.
However, that would involve the use of a backup site, such as a disaster recovery site, which is costly—especially if the backup site operates as a “shadow site that has everything online,” King said.
“All your data and material needed to operate those sites are in two different places. That is the ideal situation. But if a company didnt have the wherewithal to maintain a backup site for disaster recovery—if the location of the primary data center was hit by a natural disaster—people would be asking the same questions at this point,” he said.
The alternative would be to create a temporary backup site with a large hosting company such as IBM or Hewlett-Packard for the duration of the move, he added. “Frankly, having a full-blown backup and disaster recovery site is not an inconsequential effort, but if your business guarantees access to data 24/7, you need to give it consideration for both voluntary and involuntary outages,” he said.
The hefty cost of having such backup sites available is still most likely less than the cost that the migration snafu will have to Hostways business.
“I had six days of downtime. I lost three clients. It hurt me,” Thompson said. “This just blew their reputation out of the water. Im moving all my files over to ServerBeach.com. … A lot of people went to them from ValueWeb. I heard they were filling orders all day from ValueWeb [customers],” he added.
“I would say this could be the death knell of the business itself. I wouldnt be surprised if some of these guys … sue them,” Kay said.
“This is not something a company survives very easily,” echoed King. “I expect Hostway will put hefty money into trying to repair the relationships it has with [its] customers. In cases as severe as this, I expect the circumstances are damaging to Hostway customers,” he said.