Close
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Menu
eWEEK.com
Search
eWEEK.com
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database
    • IT Management

    What Is the Difference Between Data Deduplication, File Deduplication, and Data Compression?

    By
    EWEEK EDITORS
    -
    August 15, 2007
    Share
    Facebook
    Twitter
    Linkedin

      Q: Can you explain the differences between compression, file deduplication and data deduplication?
      A: All of these products fit into an overall market and technical concept, which is capacity optimization or data reduction. This refers to a broad group of products that seek to reduce the amount of data that has to be stored. Roughly speaking, you can rank these techniques by the amount of data reduction they yield. Compression might typically get you a 2-to-1 reduction. File deduplication, which is commonly known as content addressable storage or CAS, might yield a 3-to-1 or 4-to-1 reduction. But data deduplication-which is deduplication at the level of individual disk blocks or “chunks” rather than entire files-can often give you a 20-to-1 reduction or better, depending on the type of data. Remember, were talking about the aggregate reduction in the total amount of data stored on your backup storage device, not necessarily the reduction in any particular file or block, which can vary considerably.

      Q: Why is data deduplication so much more effective in reducing data than file deduplication?
      A: Data deduplication examines all your data on the block level and eliminates redundant blocks. So obviously it will take care of entire files that are redundant, but unlike file deduplication it will also eliminate the redundant pieces that occur when many slightly different versions of the same file are created by users or by applications like Microsoft Exchange. If users have been e-mailing back and forth a PowerPoint file while making minor changes, you can end up storing 10 or 20 files whose content is 95 percent identical. Data deduplication will catch that.

      Q: When should you use data deduplication and when should you use file deduplication?
      A: A very short answer would be that file deduplication is often used for backup solutions in so-called ROBO environments (remote office, branch office). Data deduplication can be used either in the data center itself, as a software function installed on the intelligent disk target, or on the backup client side in a ROBO environment.

      Q: Who are some of the more commonly used data deduplication vendors?
      A: There are plenty of vendors, because data deduplication is a very hot area these days, especially now that the VTL (virtual tape library) vendors are getting involved. There is Avamar (acquired by EMC), Symantec Puredisk, Asigra, Data Domain, Diligent Technologies, Falconstor, Sepaton, Quantum. Network Appliance has a product in beta.

      Q: Who are some of the more commonly used file deduplication or content addressable storage vendors?
      A: EMC has the Centera product line. Then there is Archivas (recently acquired by Hitachi Data Systems) and Caringo.

      Q: What accounts for the difference in yield between compression and file deduplication?
      A: With compression you are using some algorithm or other to reduce the size of a particular file by eliminating redundant bits. But if your users or applications have stored the same file multiple times, then no matter how good your compression method is your backup storage will end up with multiple copies of the compressed files. File deduplication goes a step further and eliminates these redundant copies, storing only one. So it gives you more reduction than just compression alone.

      Q: Where does delta block optimization fit in?
      A: This is another capacity optimization technique. Its used by incremental remote backup products like Connected (acquired by Iron Mountain) and EVault (acquired by Seagate). When you go to back up the most recent version of a file that has already been backed up, the software looks at it and tries to figure which blocks are new. Then it writes only these blocks to backup and ignores the blocks in the file that havent changed. But again, this technique has the same shortcoming compared with file deduplication as compression. If two users sitting in the same office have identical copies of the same file, then delta block optimization will create two identical backups instead of storing just one like file deduplication.

      MOST POPULAR ARTICLES

      Android

      Samsung Galaxy XCover Pro: Durability for Tough...

      CHRIS PREIMESBERGER - December 5, 2020 0
      Have you ever dropped your phone, winced and felt the pain as it hit the sidewalk? Either the screen splintered like a windshield being...
      Read more
      Cloud

      Why Data Security Will Face Even Harsher...

      CHRIS PREIMESBERGER - December 1, 2020 0
      Who would know more about details of the hacking process than an actual former career hacker? And who wants to understand all they can...
      Read more
      Cybersecurity

      How Veritas Is Shining a Light Into...

      EWEEK EDITORS - September 25, 2020 0
      Protecting data has always been one of the most important tasks in all of IT, yet as more companies become data companies at the...
      Read more
      Big Data and Analytics

      How NVIDIA A100 Station Brings Data Center...

      ZEUS KERRAVALA - November 18, 2020 0
      There’s little debate that graphics processor unit manufacturer NVIDIA is the de facto standard when it comes to providing silicon to power machine learning...
      Read more
      Apple

      Why iPhone 12 Pro Makes Sense for...

      WAYNE RASH - November 26, 2020 0
      If you’ve been watching the Apple commercials for the past three weeks, you already know what the company thinks will happen if you buy...
      Read more
      eWeek


      Contact Us | About | Sitemap

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Property of TechnologyAdvice.
      Terms of Service | Privacy Notice | Advertise | California - Do Not Sell My Info

      © 2020 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×