NEWS ANALYSIS: There is no substitute for detailed planning, testing and redundancy when it comes to making sure your data center can weather any storm. But sometimes even the best planning isn’t good enough.
Calamity descended from the skies around Washington on June 29 in the form of a derecho, a type of weather system so rare most people have never even heard of it. This unusual complex of extremely severe weather had never been known to cross a range of mountains such as the Alleghenies. But this time it happened, and disaster planning went out the window.
Amazons huge data center near Dulles International Airport, fully redundant in itself, and served by redundant backup power and redundant power grids, redundant network access went down under the combined onslaught of massive power outages, massive Internet outages, phone line outages and cell system outages. Not only did everything go down, but nobody could call for backup. And, of course, even if the staff had known that this event was happening, they couldnt have traveled there anyway. Most of the roads were blocked.
While we often preach the gospel of preparedness, there are disasters for which no one could prepare. When weather this violent appears out of nowhere, with no warning and no forecasts, there is only so much that anyone or any institution can do. The fact that Amazon was able to get back online and have all of its affected customers fully restored by the next morning was remarkable.
But Amazon was one of the few that managed this. For smaller organizations with fewer resources this calamitous blow simply took them out. Many of those companies remained down as this was written on July 2and some will never recover.
Of course, some of those smaller organizations didnt have disaster plans and were simply left hanging. Some did have plans, but they werent tested, and when push came to shove, didnt work. And some were in place, tested and should have been enough, but just like with Amazon, the planners couldnt plan for everything.
In my own company, which houses the test lab that produces those eWEEK reviews you see from time to time, I thought Id planned for anything short of the Mayan Apocalypse or a slightly more probable world-ending asteroid strike. Id even tested the lab using the backup generators, communicated using the backup WiFi hotspot and made plans for the air conditioning to be out.
But in the case of the lab, configuration changes had crept in since the last time I calculated the electrical loads and Id never tested the latest configuration. Worse, Id assumed that the T-Mobile cell near the lab would keep running for at least a few days after losing power, since it had always done so in the past.
Wayne Rash is a Senior Analyst for eWEEK Labs and runs the magazine's Washington Bureau. Prior to joining eWEEK as a Senior Writer on wireless technology, he was a Senior Contributing Editor and previously a Senior Analyst in the InfoWorld Test Center. He was also a reviewer for Federal Computer Week and Information Security Magazine. Previously, he ran the reviews and events departments at CMP's InternetWeek.
He is a retired naval officer, a former principal at American Management Systems and a long-time columnist for Byte Magazine. He is a regular contributor to Plane & Pilot Magazine and The Washington Post.