As a result of the importance—and complexity—of disaster recovery and business continuity planning, even large companies with talented in-house IT staff and financial wherewithal are bringing in consultants to make sure they can bounce back from any problem.
eWEEK Labs recently spoke with disaster recovery experts from GlassHouse Technologies Inc. and EMC Corp.: Dick Benton, manager of storage business practices, and Stephen Higgins, director of business continuity, respectively. The consultants walked us through a typical enterprise disaster recovery planning consultation, and many of the best practices and recommendations they offer can be used by any company that wants to make sure its disaster ducks are in a row.
A disaster recovery consultation can take from a few months to more than a year. The length of consultation depends on a number of factors, including the complexity of a given business and its current level of preparedness. The cost of consultation services will vary just as widely.
During disaster recovery consultation, the vast majority of time will be focused not on the IT department but on the business units of a company.
In fact, the first major step a consultant will take is to assess the various business processes within a company. During this stage, consultants will help a company determine the value of applications and data—important information that will be used to prioritize the order in which the applications and data are restored. This analysis will also determine which applications need to be recovered in tandem to function properly.
The next steps in the process are application performance testing and monitoring, both of which are key to determining how the entire infrastructure is functioning during normal business times. IT managers can then set expectations as to the level of performance that business managers can expect after a disaster has occurred.
After this kind of audit, disaster recovery consultants often recommend server consolidation because an environment that is homogeneous is the easiest to restore and manage. For every different operating system and hardware platform running in a data center, an identical operating system and hardware platform must run at your recovery site.
After analyzing and documenting business processes and performance, consultants look to see how data is protected. For example, companies need to make sure that the storage technologies they are using match the service levels the IT department has to deliver. For companies looking to implement information lifecycle management technology, this phase should be helpful in determining how to best use resources.
Consultants will also check on the health and optimization of a companys backup infrastructure to determine, for example, that backup systems are not wasting tapes by backing up data that hasnt changed in years. They will also note whether backups are being verified and tested regularly. After the design and implementation of backup and data replication tools are completed, tests should be run to ensure that recovery sites can come up and deliver the performance promised in SLAs (service-level agreements).
After the disaster recovery consultation is finished, its almost time for the process to begin again. Because IT environments are highly dynamic, companies disaster recovery processes must be adjusted to deal with new business issues, regulatory mandates, and software and hardware.
Disaster Recovery Recommendations
Disaster recovery recommendations
- Maintain multiple live sites, and split production traffic as evenly as possible
- Utilize remote sites if possible, especially remote sites where employees already work (sales offices, R&D facilities, etc.), to initiate recovery
- Keep recovery site updated; manuals, diagrams, software and everything else you use in your production site should also be available at the recovery site
- Recovery tests should be run a few times a year and must be revised to compensate for new apps and other major changes
Data center optimization
- A homogeneous data center is easiest to manage and recover. Eliminate extraneous operating system platforms, servers and storage hardware
- Run performance tests to find bottlenecks and develop SLAs for business managers
- Consolidate servers and storage as much as possible
- Make sure dependent apps (apps that feed data to one another) have a synchronized recovery point. For transaction-intensive environments, mismatched recovery points could lead to data corruption
- Streamline the backup infrastructure. It is good to have multiple copies of data, but you dont want to have too many redundant copies taking up valuable tape and disk resources
- Document the configurations and settings for all the apps and hardware in the data center, keeping close tabs on changes made
Senior Analyst Henry Baltazar can be reached at [email protected].
Check out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.