Organizations often believe they can adequately safeguard their data solely using disaster recovery (DR) tactics such as nightly tape backups combined with high availability (HA) solutions that maintain local or, preferably, remote replicas of production servers. But this is a misconception. HA and DR solutions are necessary but not sufficient components in a complete data protection architecture. Even when used in tandem, HA and DR do not protect against a common vulnerability.
This common vulnerability is the problem of missing recovery points. This problem arises because, for the most part, HA and DR provide only single point-in-time data recovery. In the case of HA, the point in time is the instant before a failure occurs. With tape-based DR solutions, the recovery point is the time the backup tape was created, which is typically sometime during the previous night. Most organizations keep multiple generations of backup tapes. While this allows for multiple recovery points, they are spaced 24 hours apart.
HA and DR do offer critical data protection functions; however, there is a large class of data integrity and availability conditions that they cannot resolve. What’s more, the issues within this class typically occur much more frequently than the problems that HA and DR are designed to solve.
These neglected data integrity and availability issues take into account any incident that corrupts or deletes data without immediately stopping operations. That includes accidental file or object deletions, as well as data corruptions that result from computer viruses or other malicious activities.
When an event of this type occurs, an organization that depends solely on tape-based backups for data protection can recover data to its state as recorded on the backup tape created the previous night. However, the data might have been updated legitimately several times after that-before the corruption or deletion occurred. The only recovery option in this case is to try to restore the data manually.
HA technology doesn’t solve this problem either. The job of HA software is to infallibly maintain up-to-date replicas of production servers. Because the HA replicator does not know if a deletion was accidental or intentional, it diligently duplicates it on the backup server as it was designed to do. Likewise, if data is modified by a computer virus or a malicious individual, the HA replicator immediately does its job of copying that change to the replica server.
Once a corruption or accidental deletion is replicated to the backup server (which is typically within seconds, at most), nightly backup tapes once again become the only electronic recovery option in these cases.
Continuous Data Protection
Continuous data protection
A recent innovation, continuous data protection (CDP), fills this data protection gap. CDP technology automatically captures transaction and object changes that occur between tape saves. It then saves that data on disk-based storage devices.
The CDP software also makes it easy to use these saved updates to quickly recover data to any point in time. Another advantage of CDP technology is that, unlike tape backups, it does not require the interruption of applications to perform backups.
With CDP in place, if an important document is accidentally deleted or data is corrupted, it can be restored to its state immediately prior to when the problem occurred. A click of a button initiates the CDP recovery function and individual data items are then typically restored within seconds.
CDP also reduces recovery times when restoring large quantities of data because, unlike other data recovery options, there is no need to rebuild and resynchronize volumes and then apply journal or other archive logs to bring the data forward to the desired point.
Compared to other options, the reduction in recovery times can be dramatic. For example, CDP can typically recover one terabyte of data in about 20 minutes. It would take about three hours to recover the same amount of data from a local disk-based copy, nine hours from a remote disk-based copy and 17 hours from a tape-based backup. CDP is available as standalone software or may be integrated into a complete HA solution.
True Versus Near CDP
True versus near CDP
CDP technologies can be categorized as falling into one of two architectures: true CDP and near CDP. The “true” in “true CPD” signifies that this architecture does indeed deliver continuous data protection.
To do so, true CDP captures every production data write and transfers it to a secondary disk. It thus enables any update to be undone by recovering the data to a point in time immediately before the update was applied.
Near CDP is not continuous but, as the “near” implies, it is generally much closer to continuous than nightly backups. Near CDP batches updates and transmits them to the backup data store only at discrete points in time, such as when a file is saved or closed.
Depending on how the software selects these save points, this can provide the benefit of creating recovery points that are “clean” (that is, those that were not created when transactions were incomplete).
The disadvantage of near CDP is that, in some cases, the recovery points may be spaced several hours or more apart. In environments with high transaction volumes or rigid compliance or governance regulations, this may not be sufficient.
Components of Integrated Data Protection Toolkits
Components of integrated data protection toolkits
Whether implemented as standalone software or integrated into HA software, CDP provides the easiest and most effective protection against the loss of critical business data. However, CDP alone is insufficient to meet organizations’ data protection needs. This is because the CDP data store typically does not hold all of the organization’s data but only contains data updates.
Furthermore, organizations often reduce storage requirements by purging updates older than a certain age. If they did not do so, storage needs would grow rapidly and endlessly because the CDP data store contains all updates rather than simply a copy of current data.
Consequently, the CDP data store alone is insufficient to recover all of an organization’s data after a disaster. This makes it necessary to first load databases and files from the most recent backup tape so they can be brought up-to-date using the CDP data store.
Another shortcoming of CDP alone is that it cannot be used to keep a business running when a production server needs to be taken offline for maintenance. This is because CDP maintains a store of data updates, not a replica server. In fact, the CDP server may run on a different platform (for example, a low-cost Windows or Linux-based server) than the production server, which might be an IBM i-based system. Thus, when the primary server is offline, CDP does not provide a ready-to-run backup server. On the other hand, HA technology does provide this facility.
Therefore, while CDP fills the data protection gap left by HA and DR tools and tactics, all three technologies are vital components in any data protection strategy. They complement each other to provide the complete data protection solution organizations need today.
Bill Hammond is Director of Product Marketing at Vision Solutions. Bill joined Vision Solutions in 2003 with more than 15 years of experience in product marketing, product management and product development roles in the technology industry. Bill is responsible for product positioning and messaging, product launches and marketplace intelligence for the company’s various solutions. He can be reached at [email protected].