Close
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database

    Which Applications Save The Most Disk Space By Using Data Deduplication During Backup?

    By
    eWEEK EDITORS
    -
    July 23, 2007
    Share
    Facebook
    Twitter
    Linkedin

      Q: What are some of the applications that generate a lot of redundant data that can be eliminated by deduplication?
      A: There are two parts to your answer. First, certain applications by their very design create redundant data on primary storage. Then backup makes it even more redundant.

      Q: Lets start with the backup part first.
      A: Suppose youre using typical backup software like Symantec NetBackup [formerly Veritas] or EMC NetWorker [formerly Legato] to do your backups. Lets say you keep 12 weeks of full backups and do daily incremental backups. This alone is going to create a lot of redundant data by definition. A good rule of thumb is that one gigabyte of data on primary storage yields 10 GB on backup.

      Q: What about the redundancy created by applications?
      A: Certain applications create tremendous amounts of redundant data even before the backup software goes to work. One major example is Microsoft Exchange. If you are sending lots of file attachments with your messages, then Exchange is most likely storing many multiple copies of these files. A common rule of thumb is that 90 percent of e-mail volume is in the attachments. If several people are mailing a spreadsheet or a PowerPoint back and forth and each time making a few changes, you can easily end up with 20 or 30 nearly identical copies of the file in Exchange. In extreme cases you can have hundreds of copies. And these extreme cases arent that uncommon. Ive seen customers where we installed deduplication get a 100-to-1 reduction in the volume of data stored by Exchange. Another example comes from the way many organizations provision disk space for their databases. When the DBA ask the database owners how much data they expect to generate, the answer sometimes represents a dream rather than reality. I recently saw a database provisioned for 4 terabytes that only had 400 Mbytes of actual data. But without deduplication the entire 4 terabytes were being regularly backed up.

      Q: So part of the problem is the way certain applications work, and part of it is bad policy?
      A: There is also a behavioral element to this. If you think about how people use their file systems, you can often observe that they dont trust their backup systems. So what they do is save multiple versions of the same document in different places, or perhaps lots of similar versions that are only slightly different. A deduplication system will notice this and keep only the new blocks.

      Q: Given the amount of redundant data that commonly gets created both before and during backup, how much overall reduction can a typical organization expect to get from deduplication?
      A: Overall a 20-to-1 reduction in the amount of data backed up is very common. It obviously depends on the exact mix of your applications and your data. It also depends on your policies and on the backup software you are using. For example, Symantec NetBackup and EMC NetWorker do full as well as incremental backups, so if you deduplicate a full weekly backup to an intelligent disk target you will save a lot of space. But IBMs TSM [Tivoli Storage Manager] uses an “incremental forever” approach and doesnt do the “full and incremental” routine that most backup products do. So with TSM and an intelligent disk target you wont see the same 20-to-1 reduction, its more likely to be 5-to-1 or 10-to-1, which of course is still significant.

      Q: Aside from “forever incremental” backups, what kinds of applications get the least benefit from data deduplication?
      A: Certain specialized types of data have inherently small amounts of redundancy. Interestingly, these are very often data types that describe natural phenomena rather than the result of human activities or business processes. One example is medical imaging, where there may not be much redundancy to begin with, and where the file formats already use specialized compression techniques. Another example is seismic data from the oil and gas exploration industry.

      eWEEK EDITORS
      eWeek editors publish top thought leaders and leading experts in emerging technology across a wide variety of Enterprise B2B sectors. Our focus is providing actionable information for today’s technology decision makers.
      Get the Free Newsletter!
      Subscribe to Daily Tech Insider for top news, trends & analysis
      This email address is invalid.
      Get the Free Newsletter!
      Subscribe to Daily Tech Insider for top news, trends & analysis
      This email address is invalid.

      MOST POPULAR ARTICLES

      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Applications

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      IT Management

      Intuit’s Nhung Ho on AI for the...

      James Maguire - May 13, 2022 0
      I spoke with Nhung Ho, Vice President of AI at Intuit, about adoption of AI in the small and medium-sized business market, and how...
      Read more
      Applications

      Kyndryl’s Nicolas Sekkaki on Handling AI and...

      James Maguire - November 9, 2022 0
      I spoke with Nicolas Sekkaki, Group Practice Leader for Applications, Data and AI at Kyndryl, about how companies can boost both their AI and...
      Read more
      Cloud

      IGEL CEO Jed Ayres on Edge and...

      James Maguire - June 14, 2022 0
      I spoke with Jed Ayres, CEO of IGEL, about the endpoint sector, and an open source OS for the cloud; we also spoke about...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2022 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×