What is Data Deduplication? | eWeek

What is Data Deduplication?

Written By
eWEEK EDITORS
eWEEK EDITORS
Jul 23, 2007
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Q: How does data deduplication work?
A: Data deduplication is based on the fact that in any enterprise where you are storing and backing up data there is a tremendous amount of content the occurs more than once. Its more efficient to eliminate or deduplicate those occurrences rather than store them in multiple places. Deduplication vendors use a variety of different algorithms. Some use hash algorithms like SHA-1, others do bit-by-bit comparison. But it boils down to examining the blocks of data in a backup stream and replacing duplicated instances with pointers to a unique instance.

Q: What do data deduplication products look like?
A: Typically its an appliance that can sit either in-band or out-of-band. If its in-band, then it analyzes and deduplicates the backup stream while its being sent to backup storage (for example, to a virtual tape library or VTL). If its out-of-band, it analyzes and rewrites the data after its been written to the backup device. In either case, the goal is to remove duplicate data while changing as little as possible in your existing infrastructure, all you do is deploy the appliance.

Q: What kind of applications does deduplication work best with?
A: It can work with either file-oriented or block-oriented applications. It really depends on which applications that particular vendors product is targeting. But you need to keep in mind that it isnt suited for data thats already been compressed or encrypted, because that will reduce the number of pattern matches the deduplication algorithm can detect. Typically you would do encryption after deduplication, not before.

Q: What are the main benefits of deduplication?
A: Well, contrary to what you might think, the most important benefit isnt really saving storage space, but the fact that you need to send less data to backup in the first place. That can save you a lot of time and bandwidth.

Q: Just how much data redundancy can be eliminated with deduplication?
A: It varies tremendously of course. In the best case, you can get a compression ratio of 20-to-1. In other words, a 20 terabyte backup would be reduced to just one terabyte. About 10% of the data deduplication users we talk to get this kind of ratio. But this is definitely something you need to test for yourself with your own data before you buy a deduplication appliance.

Q: What are some of the vendors of data deduplication gear?
A: Data Domain and Diligent Technologies are two of the leaving private independent vendors. EMC acquired a well-known company called Avamar. Network Appliance, Symantec and FalconStor also have solutions.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.