Calamity descended from the skies around Washington on June 29 in the form of a derecho, a type of weather system so rare most people have never even heard of it. This unusual complex of extremely severe weather had never been known to cross a range of mountains such as the Alleghenies. But this time it happened, and disaster planning went out the window.
Amazons huge data center near Dulles International Airport, fully redundant in itself, and served by redundant backup power and redundant power grids, redundant network access went down under the combined onslaught of massive power outages, massive Internet outages, phone line outages and cell system outages. Not only did everything go down, but nobody could call for backup. And, of course, even if the staff had known that this event was happening, they couldnt have traveled there anyway. Most of the roads were blocked.
While we often preach the gospel of preparedness, there are disasters for which no one could prepare. When weather this violent appears out of nowhere, with no warning and no forecasts, there is only so much that anyone or any institution can do. The fact that Amazon was able to get back online and have all of its affected customers fully restored by the next morning was remarkable.
But Amazon was one of the few that managed this. For smaller organizations with fewer resources this calamitous blow simply took them out. Many of those companies remained down as this was written on July 2and some will never recover.
Further reading
Of course, some of those smaller organizations didnt have disaster plans and were simply left hanging. Some did have plans, but they werent tested, and when push came to shove, didnt work. And some were in place, tested and should have been enough, but just like with Amazon, the planners couldnt plan for everything.
In my own company, which houses the test lab that produces those eWEEK reviews you see from time to time, I thought Id planned for anything short of the Mayan Apocalypse or a slightly more probable world-ending asteroid strike. Id even tested the lab using the backup generators, communicated using the backup WiFi hotspot and made plans for the air conditioning to be out.
But in the case of the lab, configuration changes had crept in since the last time I calculated the electrical loads and Id never tested the latest configuration. Worse, Id assumed that the T-Mobile cell near the lab would keep running for at least a few days after losing power, since it had always done so in the past.