Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database

    Which Applications Save The Most Disk Space By Using Data Deduplication During Backup?

    Written by

    eWEEK EDITORS
    Published July 23, 2007
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Q: What are some of the applications that generate a lot of redundant data that can be eliminated by deduplication?
      A: There are two parts to your answer. First, certain applications by their very design create redundant data on primary storage. Then backup makes it even more redundant.

      Q: Lets start with the backup part first.
      A: Suppose youre using typical backup software like Symantec NetBackup [formerly Veritas] or EMC NetWorker [formerly Legato] to do your backups. Lets say you keep 12 weeks of full backups and do daily incremental backups. This alone is going to create a lot of redundant data by definition. A good rule of thumb is that one gigabyte of data on primary storage yields 10 GB on backup.

      Q: What about the redundancy created by applications?
      A: Certain applications create tremendous amounts of redundant data even before the backup software goes to work. One major example is Microsoft Exchange. If you are sending lots of file attachments with your messages, then Exchange is most likely storing many multiple copies of these files. A common rule of thumb is that 90 percent of e-mail volume is in the attachments. If several people are mailing a spreadsheet or a PowerPoint back and forth and each time making a few changes, you can easily end up with 20 or 30 nearly identical copies of the file in Exchange. In extreme cases you can have hundreds of copies. And these extreme cases arent that uncommon. Ive seen customers where we installed deduplication get a 100-to-1 reduction in the volume of data stored by Exchange. Another example comes from the way many organizations provision disk space for their databases. When the DBA ask the database owners how much data they expect to generate, the answer sometimes represents a dream rather than reality. I recently saw a database provisioned for 4 terabytes that only had 400 Mbytes of actual data. But without deduplication the entire 4 terabytes were being regularly backed up.

      Q: So part of the problem is the way certain applications work, and part of it is bad policy?
      A: There is also a behavioral element to this. If you think about how people use their file systems, you can often observe that they dont trust their backup systems. So what they do is save multiple versions of the same document in different places, or perhaps lots of similar versions that are only slightly different. A deduplication system will notice this and keep only the new blocks.

      Q: Given the amount of redundant data that commonly gets created both before and during backup, how much overall reduction can a typical organization expect to get from deduplication?
      A: Overall a 20-to-1 reduction in the amount of data backed up is very common. It obviously depends on the exact mix of your applications and your data. It also depends on your policies and on the backup software you are using. For example, Symantec NetBackup and EMC NetWorker do full as well as incremental backups, so if you deduplicate a full weekly backup to an intelligent disk target you will save a lot of space. But IBMs TSM [Tivoli Storage Manager] uses an “incremental forever” approach and doesnt do the “full and incremental” routine that most backup products do. So with TSM and an intelligent disk target you wont see the same 20-to-1 reduction, its more likely to be 5-to-1 or 10-to-1, which of course is still significant.

      Q: Aside from “forever incremental” backups, what kinds of applications get the least benefit from data deduplication?
      A: Certain specialized types of data have inherently small amounts of redundancy. Interestingly, these are very often data types that describe natural phenomena rather than the result of human activities or business processes. One example is medical imaging, where there may not be much redundancy to begin with, and where the file formats already use specialized compression techniques. Another example is seismic data from the oil and gas exploration industry.

      eWEEK EDITORS
      eWEEK EDITORS
      eWeek editors publish top thought leaders and leading experts in emerging technology across a wide variety of Enterprise B2B sectors. Our focus is providing actionable information for today’s technology decision makers.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×