Its a matter of when, not if. Sooner or later, one of your customers is going to face a disaster—not a botched project or a PR gaffe, but the kind of event that normally involves firefighters, ambulances and/ or law enforcement. It could be anything from an earthquake to a power outage to employee sabotage; all that matters to your customer is that they have lost significant elements of their infrastructure and need to take drastic measures if they are to maintain business operations.
While no one wants to spend time contemplating worst-case scenarios, advance preparation is the only difference between a crisis that causes a minor hiccup and one that brings a business to its knees. Are your customers ready for the worst? Are you equipped to help prepare them?
The Plan The middle of an emergency is the worst possible environment to be making time-sensitive business decisions or cobbling together a to-do list. Moreover, attempting to coordinate and communicate with staff in the midst of this chaos is a difficult task at best. When disaster strikes, key employees must be aware of their roles and be prepared to fill them with little or no supervision or hesitation; as such, a detailed response plan is the core of any disaster-recovery effort.
“The quality of the recovery plan is probably the single most important factor determining the cost of any kind of disruption,” according to Steven Lewis, CEO of the Systems Audit Group and editor in chief of Disaster Recovery Yellow Pages. “Winging it, youd be lucky to recover at all.”
Brian Schwarz of Exodus Communications concurs. “Depending on the circumstances, a company with a well-written and -executed plan should be up within two days of a full-blown smoke-and-rubble disaster,” he says. “My guess is that without a plan, most businesses wouldnt be back up in less than 10 days, if ever.”
Step 1: Know Your Goals In an ideal world, every company would have a comprehensive disaster plan covering every significant contingency. This world, unfortunately, doesnt quite work that way. According to Damian Walch, senior VP of professional services at Comdisco Inc., “Usually when a customer comes looking for plan development, there has been an incident or an event somewhere thats prompted it, or an executive thats walked in somewhere and said, This is vital to our operation, and I need some capability to restore or recover it. “
The fact that a customer may be motivated by a specific concern or incident, however, does not mean that they have a clear sense of what systems or processes they want to protect. An executive may hear about the latest round of denial-of-service attacks, for example, and decide to take action without really knowing what those attacks do or what kind of impact they might have.
Before attempting to launch into any real planning, work with the client to determine precisely exactly what their goals are. Do they want to protect a particular technical capability (i.e., Web/mail/database servers, ERP or CRM applications), business function (i.e., call centers) or some combination thereof? Are they concerned about specific scenarios or general availability? How many resources are available to address these concerns? Once those questions are answered, synchronize your clients expectations and your own, and put them down on paper.
Step 2: Learn the System Now that you have a clear statement of the clients specific goals, set it aside. Business operations and computer networks alike are too complex and interlinked to allow any one function or system to be addressed in isolation; accordingly, before you can effectively explore ways to protect individual functions and systems, you must understand where they fit into the entire enterprise.
“Companies often miss major gaps in their information flow,” warns Walch. “They may have a back-office mainframe or outsourced database and application servers, and they may internally manage their end users and their network. If those three arent working in some kind of concert, youre going to have a pretty serious exposure.”
Develop a model of the client organizations entire data flow, including every element that might prove vulnerable in a disaster—hardware, software, physical facilities, staff. Prioritize each element in the model, both for importance and time-sensitivity. Once this organizationwide model is complete, it can serve as a kind of superstructure into which you can plug solutions for the specific systems that are the focus of your plan. This approach also lends itself to ongoing incremental development of disaster plans—once youve developed a model, it is much easier to come back and plug in new solutions.
Step 3: Choose a Path The next step is to choose strategies that meet your client needs and available resources. The controlling considerations in making these decisions are relatively straightforward: What is the maximum recovery time, and how much can the client pay to be certain it stays under that maximum?
At one end of the spectrum lie “deferred” approaches, which rely upon “cold sites” (available but unmaintained alternate facilities) and post-incident acquisition and/or configuration of hardware and software. While relatively inexpensive, these approaches require long recovery times.
At the other end lie internal fail-overs, in which redundant equipment and facilities are kept constantly available within the organization, allowing for nearly instantaneous recovery.
Step 4: Spell It Out No plan is of any use until it is finally put down on paper in a usable format. As simple as it may seem, this is the point at which most disaster-recovery plans go bad. “The average disaster-recovery plan looks like an encyclopedia set, and ends up used for doorstops,” warns Lewis.
“Weve walked into too many companies where we ask if somebodys got a disaster-recovery capability, and they turn around and point to 14 binders behind their desk,” adds Walch. “No ones ever going to read that, and if they did, they wouldnt remember it. They certainly arent going to pull it out for reference in the middle of an incident.”
According to Walch, the key characteristics of a good final plan are specificity, simplicity and brevity. The plan should identify specific individuals and spell out their assigned roles, authority and responsibilities in detail. If possible, each key employee should have a disaster “script” or checklist, spelling out exactly what to do and in what order.
After the Planning Even after the disaster plan is written and signed off, theres almost no end to the number of things you can do for your client to prevent little disasters from affecting operations. The same holds true for mitigating the effects of larger, more catastrophic events before they happen. Many of these tasks are driven from the disaster plan itself, but there are always a number of little niggling details that the plan fails to address. For example, simple things like having a call notification tree established to having working flashlights (or some other type of emergency lighting) in the server room are sometimes overlooked. If the phones are out, are there any cell phones handy? Does the client have the ability to use a regular analog phone at the receptionists station? Were focusing on the data aspect here, but the telecommunications infrastructure is just as important. Depending upon the business, some would add that it is more important.
At the Desktop In this day and age, storing data on a networked desktop machine is just plain crazy. If your client insists upon this practice, or the machine is a standalone unit, recommend a UPS for the desktop (and the monitor) and some sort of high-capacity removable storage device. This can be tape, recordable CD/DVD or a Zip drive. Backup utilities are essential, as is the knowledge of how to use them. Are users befuddled on how to make all of this work? Theres a training opportunity for you. Most operating systems come with some sort of backup utility these days, and there are plenty of third-party applica- tions available. Some are designed for local-only use and some are Internet-based. There are also a number of Internet-based services that will rent storage space to you for precisely this purpose. Wed shy away from the “free” services for critical company data, but for less important, but still useful information, they may do the trick.
Workgroup Worries This probably isnt the first time youve read this, but well repeat the advice: The servers powering workgroups and small enterprises need to be designed and built for fault tolerance. That means redundancy in disk and power subsystems within the server, and no single point of failure. To wit, of course youve verified that the servers have some form of RAID storage, redundant power supplies and a UPS. The workgroup may be sophisticated enough to have a SAN, as well. Are there at least two network connections to each server? Network interface cards have been known to fail. The servers up, but you cant get to it. Use the same logic for the storage network if one is installed.
Wed guess that by this time, almost everyone knows about server backup. Thats all well and good—there are a number of products available from Computer Associates, Legato and Veritas, to name a few, for this purpose.
But what happens to the tapes after theyve had data written to them? Do they stay right next to the server, or are they rotated off-site?
Time to Move Heres where it gets interesting. What happens when the location is rendered inoperative? Do employees just move elsewhere and continue to work? Does the architecture of the system support that? If its not Web-based and location-independent, then you need to define the location fail-over plan. Do you suggest a cold, warm or hot site? What can your client afford? Can (or should) they do it themselves, or should they partner with a site provider? What are the essential business processes that occur at this site that must be replicated elsewhere?
One of the most overlooked scenarios is the loss of Internet connectivity. Is there a backup in place? Your job as the business-continuance expert is to have the answers and an implementation plan ready.
Frozen in Their Tracks Its easy to see how a natural disaster can render a site unusable, but there are other factors to consider, as well.
Last week, for example, commuters between the communities of Santa Cruz and San Jose in Northern California were stopped dead in their tracks, due to snow. This is a once-in-a-decade type of event, but it illustrates a point. Whats the plan when the site is running, but people cant get to the site? Does your client support remote computing options for their employees?
Risk and Reward Disaster-recovery planning as well as business-continuance planning is all about risk management. “Act of God” type scenarios are rare, but they do happen. Water mains break in the dead of winter. Power interruptions occur. Earthquakes level cities. Backhoes sever fiber cables.
While you cannot prevent those events from happening, you can mitigate the effects that these events have upon your clients operations by planning and implementing solutions for the worst-case scenario. Thats your best bet.