In the days when companies operated only during standard business hours, nightly backup windows were not impediments, as backups were performed off-hours. But now, business hours have expanded. Many global organizations, especially those that operate Web-based sales and services, operate on a 24/7 basis.
Furthermore, backup jobs often take much longer to run because databases are much larger than they were in the past. Faster tape drives reduce the amount of time it takes to perform the backup, but they do nothing to eliminate the problem of interrupting 24/7 operations.
Even when companies aren’t in 24/7 environments, competitive pressures often require them to keep their facilities running longer to better leverage fixed assets. Again, this reduces the backup window. Although it may not shrink to zero as it does for 24/7 companies, it may still be inadequate to facilitate backup operations.
The trouble with tape-based backups
Beyond shrinking backup windows, tape suffers from other problems. For one, despite today’s high-speed drives, restoring a data center from tape can take several hours or even days-particularly if the tapes have to be retrieved from a remote location. For most of today’s businesses, this downtime can be catastrophic.
Another problem with tape-based backups is that they are created only once a day, usually at night. If a disaster destroys a data center (including any onsite logs), data updates applied after the backup tapes were created will be lost. Further, companies that rely solely on tape backups actually put more than a day’s worth of data at risk. Here is why.
While disasters are rare, data losses frequently result from human error, malevolent actions or simultaneous disk crashes that overcome the protection offered by RAID. In these instances, data must be recovered locally, and rapid recovery depends upon having the backup tapes on hand. Consequently, many companies hold the most recent backup tapes on site; they ship yesterday’s tapes to a remote backup site only when new backup tapes are created the following night. Thus, these organizations put up to two days’ worth of data at risk.
It doesn’t end there. Tape is fallible. It is estimated that as many as 25 percent of the attempts to recover data from tape are less than completely successful. Thus, an organization may lose three or more days’ worth of data if the most recent backup tape is still onsite (and the most recent offsite tape is unreadable for some reason when a disaster strikes).
Security is another issue. Being a physical medium, tape is vulnerable to theft. If the data on it is not encrypted, it could fall into the wrong hands when in transit to (or located at) a recovery site. For these reasons, tape-based backups no longer offer adequate disaster recovery protection for many of today’s organizations.
Recovery Time Objectives and Recovery Point Objectives
Recovery time objectives and recovery point objectives
If tape is not sufficient, what is? Recovery goals fall into one of two classes: Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). As the name implies, RTO identifies an organization’s goal for the maximum time it should take to recover data and applications after a disaster. In other words, how long can operations be down if something goes wrong?
The meaning of the term RPO isn’t quite as obvious, but the concept is no more difficult. RPO identifies an organization’s goal of the maximum amount of data that will be lost as a result of a disaster. It is called a RPO because it refers to a point in the ongoing stream of data (specifically, the oldest data recovery point that would be considered tolerable). In simple terms, the question is, how much data can the company afford to lose?
All other things being equal, the closer the organization’s RTO and RPO are to zero (zero recovery time and no lost data), the more it will have to invest in a disaster recovery solution that meets those objectives.
Continuous data protection
The Continuous Data Protection (CDP) products that are now available on the AIX platform diminish or eliminate many of the deficiencies of tape-based backups. In addition to providing standard replication, CDP runs on a production AIX system, capturing any updates to files and databases. These updates are then transmitted electronically to a backup system.
When replication is real-time, RPO values very close to zero can be achieved. Combined with CDP, this provides the best of both worlds-think of CDP as Tivo for the AIX server. Recovery is not only possible to the present moment, but with “true CDP” it is possible to recover to any point in time within the recovery window. This amounts to an RPO range of near-zero to as much as hours or days past.
The alternative to true CDP is near CDP. Here, data is saved at predetermined points in time called checkpoints. How these intervals are defined depends on the CDP product. Some CDP software copies data when a file is saved or closed, as this is a known, clean recovery point. Other products may copy data when processor and/or network loads are low.
In most cases, the checkpoint frequency of a near-CDP product is measured in one-hour intervals or more. Organizations with high transaction volumes may find this to be inadequate because an individual data item can change several times within the backup interval. If corruption or deletion happens in the middle of that interval, it will then be impossible to recover the data item to its state immediately before the problem occurred.
In a world where organizations are saddled with increasingly stringent data protection regulations, an incomplete recovery facility such as this may be intolerable. Near-CDP products are further limited by the usual need for large amounts of disk space to maintain the checkpoint copies of the data.
Beyond supporting more stringent RPOs, CDP delivers a capability that tape-based backups can’t provide. While disasters are exceptionally rare, the more common occurrence is the need to recover data because it was corrupted or because it was accidentally deleted. Unlike tape, CDP stores incremental data changes as they occur and, therefore, can be used to recover data to a variety of points during the day. Furthermore, with true CDP, data can be recovered to its state at any time (such as immediately before it was corrupted or deleted).
High Availability: Recovery Without Recovering
High availability: recovery without recovering
The best way to recover from a disaster is not to have to recover at all. High Availability products offered on AIX maintain near real-time replicas of all data and applications on a hot-standby backup server that can assume a production role at a moment’s notice. An additional advantage is that, in most cases, the replica server does not have to be located on the same site as the production server. Instead, it may be located across town, across the country or on the other side of the globe.
When a considerable distance separates the two servers, the backup is unlikely to be affected by a disaster that strikes the production facility. Consequently, when using a High Availability solution, it is not necessary to recover data and applications after a disaster. Instead, users can simply be switched to the remote backup server, with little interruption of business operations and virtually no lost data. Thus, a High Availability system can support RPOs and RTOs that are close to zero.
Combining the best of both worlds
High Availability solutions suffer from a similar, yet opposite problem to tape-based backups. Both systems store data only as of a particular point in time. With tape-based backups, that is typically some time during the previous night. With High Availability technology, the recovery point is always right now. That means with both tape-based backups and High Availability, it is impossible to recover data-whether single data items or a complete data center-to its state at any time other than the single recovery point. The answer is to combine High Availability and CDP.
For any AIX shop that has not reviewed its recovery strategies and tools recently, it’s time to take another look. The demands for business resiliency have never been louder or stronger. Fortunately, the recovery products available on AIX have more than kept pace.
John Gay is Director of Sales Engineering at Vision Solutions, Inc. Prior to this, John served as the product strategist for Lakeview Technology (now merged with Vision Solutions). John also spent eight years in development with IBM and Sterling Commerce. John has eight years of experience in technical sales, with an emphasis on the business intelligence market. He can be reached at [email protected].