How to Implement a Successful Data Deduplication Strategy

By Eric Schou  |  Posted 2010-08-30

How to Implement a Successful Data Deduplication Strategy

The IT organizations of today cannot rely on the data protection model of yesteryear, which can be characterized as tape-based, decentralized and populated primarily with physical servers. Virtualization and the large amounts of data to protect mandate a new approach to protecting and managing information.

These days, with 50 percent annual data growth, how can organizations protect all of their data within an ever-shrinking backup window? How quickly can virtual machines or complex applications such as SharePoint actually be restored? And how much data can businesses really afford to lose in the event of an outage?

Just as next-generation tools such as disk-based backup are revolutionizing data protection, data deduplication is enabling a new era of information management. Now, with the ability to deduplicate data everywhere and manage it centrally, organizations are able to not only improve data protection operations and lower costs but move towards a more systematic approach for managing information growth.

Why data deduplication?

Simply stated, data deduplication is the process of eliminating redundant data. Deduplication backs up only unique data at the sub-file level. Needless to say, in environments where storage needs continue to intensify and holding down costs remains a key issue, deduplication offers welcome relief for today's IT organizations.

Once familiar with data deduplication, it should not be a surprise that by eliminating redundant data, deduplication enables companies to reduce storage costs. What many do not know, however, is that deduplication has other useful benefits such as bandwidth savings, faster backups, backup consolidation and easier disaster recovery-depending on where and how it is used.

Measurable Benefits from Data Deduplication

Measurable benefits from data deduplication

By looking at all the ways and places one can benefit from data deduplication, an IT organization can make the right decision on where to begin using this powerful technology. IT organizations have seen a range of measurable benefits from deduplication that includes the ability to:

1. Move up to 90 percent less VM data

2. Reduce backup storage by as much as 95 percent

3. Minimize backup windows and reduce network utilization by up to 90 percent

4. Eliminate 80 percent of tape costs and obviate the need to invest in virtual tape libraries

Data deduplication can be performed in two places: at the source or at the target. Deduplication as close to the information source as possible delivers the most value and can enhance a large part of many environments. Of course, every environment is different and these decisions should be based on the type of data, the volume and, of course, the recovery service-level agreements (SLAs) of that environment.

Data deduplication at the source

With source (often referred to as client-side) data deduplication, data is deduplicated before it is transmitted across the network and stored. By eliminating redundant data before it is sent across the network, deduplication at the source improves the efficient utilization of bandwidth, storage and VM resources across the entire infrastructure.

It is likely that many organizations could use client-side data deduplication for as much as 60 to 80 percent of their data. This would result in faster backups, dramatically less network usage and reduced storage consumption.

Some client-side data deduplication solutions work the same across virtual and physical environments. As a result, regardless of whether it is a VM or a physical machine, less data is stored. This not only reduces storage costs in the data center, it also makes it easier to move data to a disaster recovery site using replication.

Data Deduplication at the Target

Data deduplication at the target

Data deduplication can also occur at the target such as a media server or a storage appliance. With media server deduplication, backup data moves from a client (the system protected) to the backup software's server (the media server). The media server performs the deduplication and sends only the unique data segments to the back-end storage. This leads to savings in back-end storage as well as a reduction in the infrastructure needed to store backup data.

Media server data deduplication is very suitable for use cases such as off-host VM backups, Network Data Management Protocol (NDMP) backups and data center work loads such as high-transaction databases that tend to have high data change rates.

Like data deduplication at the media server, deduplication by an appliance is also considered target-side data deduplication. With a disk-based deduplication appliance, backup data moves across a network from a client to a backup server and then to the appliance. The appliance performs deduplication and sends the unique data to its storage source, resulting in an overall reduction in backup storage.

While most backup software products see these appliances as native disk, some vendors have begun to offer solutions with tighter integration between the software and the storage appliance. The additional integration allows organizations to further improve the performance and savings they derive from these appliance. For example, tighter integration can enhance the use of replication, improve the speed of recovery or enhance disaster recovery operations by better integrating with tape devices.

Next Steps to a Data Deduplication Strategy

Next steps to a data deduplication strategy

Clearly, data deduplication is a cost-effective information management tool that organizations can use virtually anywhere in their enterprise to address pressing IT challenges. From remote offices to VMs to data center work loads, deduplication can play a role in controlling storage costs, increasing reliability and simplifying operations. Here are four questions to ask to help prioritize the approach:

1. What percentage of data is backed up across the network?

2. Are VM backup or recovery times satisfactory?

3. Is there storage that could be redeployed for backup data deduplication?

4. How much savings would be realized if 50 percent of tapes were eliminated?

Broadly speaking, data deduplication helps organizations meet increasingly strict SLAs associated with backup windows, recovery time objectives (RTOs) and recovery point objectives (RPOs). But remember that organizations can benefit from deduplication in more than one place. Client-side deduplication can improve backup times for physical and VMs and reduce bandwidth requirements. Of course, target-side offers similar storage benefits and may not require updates to existing backup clients.

Finally, there are solutions on the market that offer a combination of both source and target data deduplication to achieve even greater storage savings and ROI. Find the approach that works best for you. You'll soon realize that deduplication is no longer a "nice to have." It is a requirement in the data center.

Eric Schou is a Senior Manager with Symantec Corporation. Before joining Symantec, Eric spent over 10 years in the storage industry, working for both Maxtor Corporation and Quantum Corporation in a marketing capacity. Prior to that, Eric worked for Arrow Electronics for five years as a senior sales representative, managing Tier 1 distribution customers. He can be reached at

Rocket Fuel