Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database
    • IT Management

    What Is the Difference Between Data Deduplication, File Deduplication, and Data Compression?

    Written by

    eWEEK EDITORS
    Published August 15, 2007
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Q: Can you explain the differences between compression, file deduplication and data deduplication?
      A: All of these products fit into an overall market and technical concept, which is capacity optimization or data reduction. This refers to a broad group of products that seek to reduce the amount of data that has to be stored. Roughly speaking, you can rank these techniques by the amount of data reduction they yield. Compression might typically get you a 2-to-1 reduction. File deduplication, which is commonly known as content addressable storage or CAS, might yield a 3-to-1 or 4-to-1 reduction. But data deduplication-which is deduplication at the level of individual disk blocks or “chunks” rather than entire files-can often give you a 20-to-1 reduction or better, depending on the type of data. Remember, were talking about the aggregate reduction in the total amount of data stored on your backup storage device, not necessarily the reduction in any particular file or block, which can vary considerably.

      Q: Why is data deduplication so much more effective in reducing data than file deduplication?
      A: Data deduplication examines all your data on the block level and eliminates redundant blocks. So obviously it will take care of entire files that are redundant, but unlike file deduplication it will also eliminate the redundant pieces that occur when many slightly different versions of the same file are created by users or by applications like Microsoft Exchange. If users have been e-mailing back and forth a PowerPoint file while making minor changes, you can end up storing 10 or 20 files whose content is 95 percent identical. Data deduplication will catch that.

      Q: When should you use data deduplication and when should you use file deduplication?
      A: A very short answer would be that file deduplication is often used for backup solutions in so-called ROBO environments (remote office, branch office). Data deduplication can be used either in the data center itself, as a software function installed on the intelligent disk target, or on the backup client side in a ROBO environment.

      Q: Who are some of the more commonly used data deduplication vendors?
      A: There are plenty of vendors, because data deduplication is a very hot area these days, especially now that the VTL (virtual tape library) vendors are getting involved. There is Avamar (acquired by EMC), Symantec Puredisk, Asigra, Data Domain, Diligent Technologies, Falconstor, Sepaton, Quantum. Network Appliance has a product in beta.

      Q: Who are some of the more commonly used file deduplication or content addressable storage vendors?
      A: EMC has the Centera product line. Then there is Archivas (recently acquired by Hitachi Data Systems) and Caringo.

      Q: What accounts for the difference in yield between compression and file deduplication?
      A: With compression you are using some algorithm or other to reduce the size of a particular file by eliminating redundant bits. But if your users or applications have stored the same file multiple times, then no matter how good your compression method is your backup storage will end up with multiple copies of the compressed files. File deduplication goes a step further and eliminates these redundant copies, storing only one. So it gives you more reduction than just compression alone.

      Q: Where does delta block optimization fit in?
      A: This is another capacity optimization technique. Its used by incremental remote backup products like Connected (acquired by Iron Mountain) and EVault (acquired by Seagate). When you go to back up the most recent version of a file that has already been backed up, the software looks at it and tries to figure which blocks are new. Then it writes only these blocks to backup and ignores the blocks in the file that havent changed. But again, this technique has the same shortcoming compared with file deduplication as compression. If two users sitting in the same office have identical copies of the same file, then delta block optimization will create two identical backups instead of storing just one like file deduplication.

      eWEEK EDITORS
      eWEEK EDITORS
      eWeek editors publish top thought leaders and leading experts in emerging technology across a wide variety of Enterprise B2B sectors. Our focus is providing actionable information for today’s technology decision makers.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×