How Deduplication Has Evolved to Handle the Deluge of Data

 
 
By Chris Preimesberger  |  Posted 2015-11-13
 
 
 
 
 
 
 
 
 
  • Previous
    1 - How Deduplication Has Evolved to Handle the Deluge of Data
    Next

    How Deduplication Has Evolved to Handle the Deluge of Data

    Deduplication comes in many unique forms, meaning that a variety of solutions exist to aid small and midsize organizations with their backup needs.
  • Previous
    2 - Inline Deduplication
    Next

    Inline Deduplication

    This is an "always on" solution that works in real time as data is being written to the system. By indiscriminately deduplicating all incoming data, this process ensures a comprehensive capture, but it isn't intuitive—spending time deduplicating data sets with minimal duplicates is a waste of time and resources, such as random access memory (RAM).
  • Previous
    3 - Post-Process Deduplication
    Next

    Post-Process Deduplication

    This method analyzes and eliminates redundant data following a full backup, which yields space savings, but also requires storage space on the disk to hold data until it is deduplicated. Since it requires space to store the full backup in the first place, it is counterintuitive for organizations seeking to reduce their need for storage space through deduplication.
  • Previous
    4 - Deduplication
    Next

    Deduplication

    This method involves a separate deduplication agent for each system that needs protecting, which can be effective, but buyer beware: This method is expensive, complex and time-consuming. Some vendors leverage it as an effective solution, but the multiplication of expensive systems, software licenses and bandwidth requirements can diminish this method's overall value.
  • Previous
    5 - Target Deduplication
    Next

    Target Deduplication

    Workable in real time or post-process, backup data is deduplicated and stored to disk. In this method, the backup software acts as the data mover, so it doesn't require a user to change backup configurations or policies—the only change required is to the destination of the backup streams. This can be an attractive feature, but while it can be effective, the data is not deduplicated until it reaches the backup appliance. This requires an extra layer of software rendered unnecessary by more recent advancements in deduplication. Post-processing is also often combined with this technology, making these systems less storage-efficient.
  • Previous
    6 - Source-Side Deduplication
    Next

    Source-Side Deduplication

    Considered the next generation of deduplication technology, this method only backs up new and unique data at the source. After an initial full snapshot backup is taken and saved to a recovery point server, future backups capture only new, incremental changes to the data, which results in dramatic efficiencies in required bandwidth, storage requirements, and data protection and recovery across multiple sites. The advantage of source-side deduplication is the reduction of data sent across the network and the resulting performance gain.
  • Previous
    7 - Global, Source-Side Deduplication
    Next

    Global, Source-Side Deduplication

    Global deduplication is optimized source-side deduplication. With this method, every computer, virtual machine or server across local, remote and virtual sites communicates with a recovery point server (RPS) that manages a global database index of all associated files while intuitively determining what needs to be backed up. Then, the RPS pulls only new data as required while eliminating duplicate copies. It then shares the deduplicated intelligence across all source systems. Since backup data is globally deduplicated before it is transferred to the target RPS, only changes are sent over the network, which improves performance and reduces bandwidth usage.
  • Previous
    8 - Common Misconceptions About Deduplication
    Next

    Common Misconceptions About Deduplication

    These are the most common: 1) All deduplication is the same and comes standard in every backup and recovery solutions; 2) inline deduplication will slow down performance; 3) source-side deduplication consumes too much processing power on the client. All are wrong. These variations may not appear to create big differences on the surface, but they can have a significant impact on the amount of data you can back up, how much usable capacity is required, how quickly you can recover from unplanned system disruptions and your budget.
  • Previous
    9 - Misconception No. 1: All Deduplication Is the Same
    Next

    Misconception No. 1: All Deduplication Is the Same

    Deduplication can mean very different things, and the efficiency of this technology greatly varies from product to product. Some perform target-side, while others perform source-side deduplication; some perform deduplication per backup job, while others perform deduplication across all storage systems. Further, many vendors offer stand-alone deduplication software, which is important to account for when developing your backup and recovery requirements.
  • Previous
    10 - Misconception No. 2: Inline Deduplication Slows Performance
    Next

    Misconception No. 2: Inline Deduplication Slows Performance

    As the size of the data increases (e.g., 250Kb, 512Kb, 1024Kb), the less efficient deduplication becomes. Likewise, the more data you process, the more computational resources are required. To achieve inline deduplication that doesn't slow down due to lack of compute resources, vendors must design their own highly sophisticated data management structure. Unbeknown to many, this technology is not simply available off-the-shelf. However, you can quickly identify a vendor's level of data management sophistication by looking at how it supports large data sets. If the inline deduplication only supports large data sets (e.g., 512Kb or 1024Kb), it's a good indication that it's limited to a single backup job or storage volume.
  • Previous
    11 - Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware
    Next

    Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware

    Global deduplication refers to the process of multiple backup devices federating the data management structure for maximum deduplication efficiency. This means every computer, virtual machine or server that is backed up communicates with a backup server that manages a global database index of files on all machines, everywhere. This type requires a sophisticated workflow to optimize replication between the source client and the backup device. This is hard technology to develop, and one that not every vendor has. Knowing this, it makes sense that many people think that global, source-side deduplication is only meant for VMware—not for physical machines or other virtual systems. However, this technology does exist and can yield tremendous operational efficiencies.
  • Previous
    12 - Key Trend No. 1: Inline Deduplication
    Next

    Key Trend No. 1: Inline Deduplication

    How well deduplication performs is largely based on whether it is post-processed or inline. As its name says, post-process deduplication means that incoming data is first stored to disk and the data is processed for deduplication at a later time. Alternatively, when data is processed for deduplication before being written to disk, this is called inline deduplication. Inline deduplication has the advantage of writing data to disk only once and is the preferred method of deduplication when compared to post-process deduplication, which requires extra storage space and writes to more disk.
  • Previous
    13 - Key Trend No. 2: Global, Source-Side Deduplication
    Next

    Key Trend No. 2: Global, Source-Side Deduplication

    The process of source-side deduplication entails backup servers that work in conjunction with agents installed on the clients (the "data source"). The client software communicates with the backup servers to compare new blocks of data and removes redundancies before the data is transferred over the network. Without having to check for duplicate data, this form of deduplication yields dramatic savings in terms of bandwidth, required storage and corresponding costs. Global, source-side deduplication takes this process a step further by sharing all of an organization's deduplicated data intelligence across all source systems. This is quickly replacing target deduplication as the preferred method because of its ability to back up only new and unique data at the source across a global database index of files.
 

The alarming influx of accumulative business data is expensive to store, secure and make readily accessible for the right people so they can do business well. Small to medium-size businesses are particularly vulnerable to this challenge—preserving the integrity of their data but facing potential budget issues that may be related to acquiring more storage space. To be effectively managed, adequately protected and completely recovered, data stores must be lean, with backups kept to a minimum. Yet, research has shown that 13 copies of each business document exist within a storage system on average. This is why deduplication is a highly effective method to address ubiquitous data growth. It identifies and eliminates redundant information in a data set and allows organizations to reduce their storage space and network bandwidth needs. However, deduplication is often viewed as a standard capability within every backup and recovery solution—a one-size-fits-all feature. The truth is that deduplication processes vary significantly, and it can be hard to know which is best. In this eWEEK slide show, we offer insight into various deduplication practices that come from unified data protection provider Arcserve.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel