How Deduplication Has Evolved to Handle the Deluge of Data | eWeek

How Deduplication Has Evolved to Handle the Deluge of Data

Deduplication
Nov 13, 2015
5 minute read
eWeek Le contenu et les recommandations de produits sont indépendants de la rédaction. Nous pouvons gagner de l'argent lorsque vous cliquez sur des liens vers nos partenaires. En savoir plus


How Deduplication Has Evolved to Handle the Deluge of Data

1 - How Deduplication Has Evolved to Handle the Deluge of Data

Deduplication comes in many unique forms, meaning that a variety of solutions exist to aid small and midsize organizations with their backup needs.


Inline Deduplication

2 - Inline Deduplication

This is an “always on” solution that works in real time as data is being written to the system. By indiscriminately deduplicating all incoming data, this process ensures a comprehensive capture, but it isn’t intuitive—spending time deduplicating data sets with minimal duplicates is a waste of time and resources, such as random access memory (RAM).


Post-Process Deduplication

3 - Post-Process Deduplication

This method analyzes and eliminates redundant data following a full backup, which yields space savings, but also requires storage space on the disk to hold data until it is deduplicated. Since it requires space to store the full backup in the first place, it is counterintuitive for organizations seeking to reduce their need for storage space through deduplication.


Advertisement

Deduplication

4 - Deduplication

This method involves a separate deduplication agent for each system that needs protecting, which can be effective, but buyer beware: This method is expensive, complex and time-consuming. Some vendors leverage it as an effective solution, but the multiplication of expensive systems, software licenses and bandwidth requirements can diminish this method’s overall value.


Target Deduplication

5 - Target Deduplication

Workable in real time or post-process, backup data is deduplicated and stored to disk. In this method, the backup software acts as the data mover, so it doesn’t require a user to change backup configurations or policies—the only change required is to the destination of the backup streams. This can be an attractive feature, but while it can be effective, the data is not deduplicated until it reaches the backup appliance. This requires an extra layer of software rendered unnecessary by more recent advancements in deduplication. Post-processing is also often combined with this technology, making these systems less storage-efficient.


Source-Side Deduplication

6 - Source-Side Deduplication

Considered the next generation of deduplication technology, this method only backs up new and unique data at the source. After an initial full snapshot backup is taken and saved to a recovery point server, future backups capture only new, incremental changes to the data, which results in dramatic efficiencies in required bandwidth, storage requirements, and data protection and recovery across multiple sites. The advantage of source-side deduplication is the reduction of data sent across the network and the resulting performance gain.


Global, Source-Side Deduplication

7 - Global, Source-Side Deduplication

Global deduplication is optimized source-side deduplication. With this method, every computer, virtual machine or server across local, remote and virtual sites communicates with a recovery point server (RPS) that manages a global database index of all associated files while intuitively determining what needs to be backed up. Then, the RPS pulls only new data as required while eliminating duplicate copies. It then shares the deduplicated intelligence across all source systems. Since backup data is globally deduplicated before it is transferred to the target RPS, only changes are sent over the network, which improves performance and reduces bandwidth usage.


Advertisement

Common Misconceptions About Deduplication

8 - Common Misconceptions About Deduplication

These are the most common: 1) All deduplication is the same and comes standard in every backup and recovery solutions; 2) inline deduplication will slow down performance; 3) source-side deduplication consumes too much processing power on the client. All are wrong. These variations may not appear to create big differences on the surface, but they can have a significant impact on the amount of data you can back up, how much usable capacity is required, how quickly you can recover from unplanned system disruptions and your budget.


Misconception No. 1: All Deduplication Is the Same

9 - Misconception No. 1: All Deduplication Is the Same

Deduplication can mean very different things, and the efficiency of this technology greatly varies from product to product. Some perform target-side, while others perform source-side deduplication; some perform deduplication per backup job, while others perform deduplication across all storage systems. Further, many vendors offer stand-alone deduplication software, which is important to account for when developing your backup and recovery requirements.


Misconception No. 2: Inline Deduplication Slows Performance

10 - Misconception No. 2: Inline Deduplication Slows Performance

As the size of the data increases (e.g., 250Kb, 512Kb, 1024Kb), the less efficient deduplication becomes. Likewise, the more data you process, the more computational resources are required. To achieve inline deduplication that doesn’t slow down due to lack of compute resources, vendors must design their own highly sophisticated data management structure. Unbeknown to many, this technology is not simply available off-the-shelf. However, you can quickly identify a vendor’s level of data management sophistication by looking at how it supports large data sets. If the inline deduplication only supports large data sets (e.g., 512Kb or 1024Kb), it’s a good indication that it’s limited to a single backup job or storage volume.


Advertisement

Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware

11 - Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware

Global deduplication refers to the process of multiple backup devices federating the data management structure for maximum deduplication efficiency. This means every computer, virtual machine or server that is backed up communicates with a backup server that manages a global database index of files on all machines, everywhere. This type requires a sophisticated workflow to optimize replication between the source client and the backup device. This is hard technology to develop, and one that not every vendor has. Knowing this, it makes sense that many people think that global, source-side deduplication is only meant for VMware—not for physical machines or other virtual systems. However, this technology does exist and can yield tremendous operational efficiencies.


Key Trend No. 1: Inline Deduplication

12 - Key Trend No. 1: Inline Deduplication

How well deduplication performs is largely based on whether it is post-processed or inline. As its name says, post-process deduplication means that incoming data is first stored to disk and the data is processed for deduplication at a later time. Alternatively, when data is processed for deduplication before being written to disk, this is called inline deduplication. Inline deduplication has the advantage of writing data to disk only once and is the preferred method of deduplication when compared to post-process deduplication, which requires extra storage space and writes to more disk.


Key Trend No. 2: Global, Source-Side Deduplication

13 - Key Trend No. 2: Global, Source-Side Deduplication

The process of source-side deduplication entails backup servers that work in conjunction with agents installed on the clients (the “data source”). The client software communicates with the backup servers to compare new blocks of data and removes redundancies before the data is transferred over the network. Without having to check for duplicate data, this form of deduplication yields dramatic savings in terms of bandwidth, required storage and corresponding costs. Global, source-side deduplication takes this process a step further by sharing all of an organization’s deduplicated data intelligence across all source systems. This is quickly replacing target deduplication as the preferred method because of its ability to back up only new and unique data at the source across a global database index of files.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Propriété de TechnologyAdvice. © 2026 TechnologyAdvice. Tous droits réservés

Divulgation publicitaire : Certains des produits qui apparaissent sur ce site proviennent d'entreprises dont TechnologyAdvice reçoit une compensation. Cette compensation peut influencer la façon dont les produits apparaissent sur ce site, notamment l'ordre dans lequel ils apparaissent. TechnologyAdvice n'inclut pas toutes les entreprises ou tous les types de produits disponibles sur le marché.