When I left you all in charge of our infrastructure for three weeks, while I took one group of Boy Scouts on a 36-mile mountain backpack trip and took another group to summer camp, I didnt think Id come back from that vacation to find things in such dreadful disarray.
Between the Blaster worm on the digital side and the power-grid collapse in the eastern United States and Canada on the analog side, Id really love to have an excuse to head back to the woods for another week or two. Instead, I find myself thinking about three little words that I use a lot as a disaster preparedness volunteer with various ham-radio organizations: common mode failure.
Common mode failure results from a shared dependence, possibly buried several layers behind the scenes. It usually happens to people who dont know
enough about how things work and dont realize that a single failure can take down more than one of their critical capabilities. For example, I once heard someone say that if his telephones went down, a backup communications system would be fax machines. Someone had to explain, gently, that fax machines call each other on that same phone system.
Common mode failure is the dark side of what people cheerfully call convergence when theyre assuming that everything works correctly all the time. For example, I love the idea of a telephone system that uses “smart badges” to track every employees location so that I can call a person rather than call a place; if the phone closest to you rings, you pick it up because its probably for you. This makes more sense than polluting the radio spectrum, and carrying around more battery weight than necessary, as the price of using wireless systems as our primary voice-call facility.
But Id never want to give up the backup capability of calling a number directly or of reading the number off a telephone instrument and being able to tell someone how to call that instrument again. If the directory server goes down, I dont want to have both e-mail and voice communications disabled; if a malfunctioning radio transmitter or a deliberate jamming attack wipes out my wireless service, I dont want to be also unreachable by my wired telephone just because my radio-frequency ID tag is suddenly invisible to the system. Either of these would be a common mode failure, and either might be a rude surprise to optimistic fans of digital convergence.
Just as an exercise, it might be fun to look over some of the “ubiquitous computing” sites and projects listed at www.cs. bell-labs.com/who/ cyoung/ubiq.html to see how many of them present interesting opportunities for killing more than one capability with one blow.
This is not just a problem in physical systems such as telephone networks; its also a growing threat in software systems based on protocols such as Web services. Web sites that appear to offer independent capabilities might be sharing, for example, dependence on the services APIs of Google for content searching or of Amazon.com for storefront operations. If neither site maintains a failover option, then both might suddenly lose function at the same time—and I might lose a critical e-business capability I thought I had safely multisourced.
This is why we must be careful what we ask for, because we just might get it. Some software theorists seem to think that implementing any given function exactly once and sharing that function across all possible applications should be our goal. I ask them if theyd ride in a car in which every rivet, rather than being a separate point of strength, was somehow an instantiation of a single fastener—especially if that fastener were one of our earliest attempts at design and fabrication. Taken to its extreme, software reuse voids the entire concept of the learning curve: Once something is done well enough, it never gets done any better.
And as we all saw from the nasty things that happened while I was away, we need to do things better. We need to discuss failure modes of systems, not just products; we need to think in game-theory terms of uncertainty and active attack, not just build systems that work under foreseeable conditions. Convergence is cool, but technology diversity and defense in depth are still goals well worth pursuing.
Discuss this in the eWeek forum.
Peter Coffee can be reached at [email protected].