How to Ensure Simpler Microsoft Exchange High Availability

Microsoft Exchange high availability has become increasingly important to businesses of all sizes. Knowledge Center contributor Jerry Melnick offers five tips on how to ensure Microsoft Exchange in both physical and virtual environments.

/images/stories/70x50/bug_knowledgecenter_70x70_%282%29.jpgFor many companies, e-mail has become a more important communication tool than the telephone. Internal employee communication, vendor and partner communication, e-mail integration with business applications, collaboration using shared documents and schedules, and the ability to capture and archive key business interactions all contribute to the increasing reliance on e-mail.

As a result, Microsoft Exchange high availability has become increasingly important to businesses of all sizes. The following five tips are designed to help your company get started protecting Microsoft Exchange in physical and virtual environments.

Tip #1: Protect against server failures with quality hardware and component redundancy

Server core components include power supplies, fans, memory, CPUs and main logic boards. Purchasing robust, name-brand servers, performing recommended preventative maintenance, and monitoring server errors for signs of future problems, can all help reduce the chances of Exchange downtime due to catastrophic server failure.

Downtime caused by server component failures can be significantly reduced by adding redundancy at the component level. Examples include redundant power and cooling, error-correcting code memory (ECC memory) with the ability to correct single-bit memory errors, and combining Ethernet cards with RAID.

Step #2: Get rid of storage failures with storage device redundancy and RAID

Storage protection relies on device redundancy, combined with RAID storage algorithms to protect data access and data integrity from hardware failures. There are distinct issues for both local disk storage and for shared, network storage. For local storage, it is quite easy to add extra disks configured with RAID protection. A second disk controller is also required if you want to protect against controller failures.

Access to shared storage relies on either a Fibre Channel or Ethernet storage network. To assure uninterrupted access to shared storage, these networks must be designed to eliminate all single points of failure. This requires redundancy of network paths, network switches and network connections to each storage array.

Step #3: Prevent network failures with redundant network paths, switches and routers

The network infrastructure itself must be fault-tolerant, consisting of redundant network paths, switches, routers and other network elements. Server connections can also be duplicated to eliminate failovers caused by the failure of a single server or network component. Take care to ensure that the physical network hardware does not share common components. For example, dual-ported network cards share common hardware logic and a single card failure can disable both ports. Full redundancy requires either two separate adapters or the combination of a built-in network port along with a separate network adapter.

Step #4: Forget site failures with data replication to another site

Site failures can range from an air conditioning failure, a leaking roof that affects a single building, a power failure that affects a limited local area, or a major hurricane that affects a large geographic area. Site disruptions can last anywhere from a few hours to days-or even weeks.

There are two methods for dealing with site disasters. One method is to tightly couple redundant servers across high speed/low latency links to provide zero data loss and zero downtime. The other method is to loosely couple redundant servers over medium speed/higher latency/greater distance lines to provide a Disaster Recovery (DR) capability where a remote server can be restarted with a copy of the application database, which only misses the last few updates. In the latter case, asynchronous data replication is used to keep a backup copy of the data.

Data replication is combined with error detection and failover tools to help get a DR site up and running in minutes or hours, rather than days.

Step #5: Consider virtualizing Exchange for better high availability

The latest server virtualization technologies, while not required for protecting Exchange, do offer some unique benefits that can make Exchange protection both easier and more effective. Virtualization makes it very easy to set up evaluation test and development environments without the need for additional dedicated hardware. Virtualization also allows resources to be adjusted dynamically to accommodate growth or peak loads.

/images/stories/heads/knowledge_center/melnick_jerry70x70.jpg Jerry Melnick is Chief Technology Officer at Marathon Technologies Corporation, the leading provider of automated, fault tolerant-class availability solutions for virtual and physical environments. He can be reached at