Data Center Disaster Preparedness Sometimes Requires a Dose of Robert Burns

NEWS ANALYSIS: There is no substitute for detailed planning, testing and redundancy when it comes to making sure your data center can weather any storm. But sometimes even the best planning isn’t good enough.

Calamity descended from the skies around Washington on June 29 in the form of a derecho, a type of weather system so rare most people have never even heard of it. This unusual complex of extremely severe weather had never been known to cross a range of mountains such as the Alleghenies. But this time it happened, and disaster planning went out the window.

Amazon€™s huge data center near Dulles International Airport, fully redundant in itself, and served by redundant backup power and redundant power grids, redundant network access went down under the combined onslaught of massive power outages, massive Internet outages, phone line outages and cell system outages. Not only did everything go down, but nobody could call for backup. And, of course, even if the staff had known that this event was happening, they couldn€™t have traveled there anyway. Most of the roads were blocked.

While we often preach the gospel of preparedness, there are disasters for which no one could prepare. When weather this violent appears out of nowhere, with no warning and no forecasts, there is only so much that anyone or any institution can do. The fact that Amazon was able to get back online and have all of its affected customers fully restored by the next morning was remarkable.

But Amazon was one of the few that managed this. For smaller organizations with fewer resources this calamitous blow simply took them out. Many of those companies remained down as this was written on July 2€”and some will never recover.

Of course, some of those smaller organizations didn€™t have disaster plans and were simply left hanging. Some did have plans, but they weren€™t tested, and when push came to shove, didn€™t work. And some were in place, tested and should have been enough, but just like with Amazon, the planners couldn€™t plan for everything.

In my own company, which houses the test lab that produces those eWEEK reviews you see from time to time, I thought I€™d planned for anything short of the Mayan Apocalypse or a slightly more probable world-ending asteroid strike. I€™d even tested the lab using the backup generators, communicated using the backup WiFi hotspot and made plans for the air conditioning to be out.

But in the case of the lab, configuration changes had crept in since the last time I calculated the electrical loads and I€™d never tested the latest configuration. Worse, I€™d assumed that the T-Mobile cell near the lab would keep running for at least a few days after losing power, since it had always done so in the past.

Wayne Rash

Wayne Rash

Wayne Rash is a freelance writer and editor with a 35 year history covering technology. He’s a frequent speaker on business, technology issues and enterprise computing. He covers Washington and...