Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database
    • IT Management
    • Storage

    How to Choose the Right Deduplication Technology

    Written by

    Janae Lee
    Published November 11, 2009
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Deduplication is a hot technology. Because of this, many vendors have responded with a proliferation of approaches and terminologies that seem more designed to confuse than to explain. Global deduplication. Content-aware. Target-based. Source-based. ISV-integrated. So, what does it all mean? And how can businesses know when and how to deploy this new offering?

      When it comes to deduplication, it helps to focus on the basics. For example, just what is deduplication and what benefits come from using it?

      Deduplication explained

      First, deduplication is a data discovery and indexing technology which decreases the volume of data in a storage or communication system while maintaining complete data access.

      By reducing data volume, deduplication decreases the hardware, software, communications and administration costs associated with maintaining and managing the data. Unlike tools such as data classification, which require human analysis and intervention, data deduplication happens automatically.

      A deduplication system finds strings of data which are exactly the same, saves the first instance of each unique string, and stores a pointer (index) for every successive copy. Generally, this process is sub-file. The definitive ROI of any deduplication product is its deduplication ratio-that is, the degree to which it extracts common data, reducing volume. A 100TB data set with a 2:1 deduplication ratio will result in approximately 50TB of data needing to be stored. That same data set at a 20:1 ratio will result in storing five terabytes, while still maintaining application access to all the same information.

      Different deduplication products accomplish the string discovery and indexing process in different ways. Despite this, there are four basic rules driving which approach is the best fit for an organization.

      Rule No. 1: Higher deduplication ratios are good

      Higher deduplication ratios are good; these are delivered by data intelligence and system scalability. Different deduplication approaches and products deliver different deduplication ratios. The success of an approach rests on the solution’s effectiveness in finding common strings. Products that operate sub-file and account for variable length strings tend to discover and extract more duplicate data. Results vary by product and by application usage.

      For example, backup natively creates many copies of data both across and within systems over time, but the resulting deduplication ratios can vary widely depending on data type, data change rate and even the customer’s backup model. An average deduplication ratio of 20:1 or higher is not unusual, but underneath this average may be virtual machine file backups at 40:1, e-mail backups at 15:1 and transactional database backups at 3:1. Solutions claiming “content awareness” often promise higher deduplication ratios. Organizations should ignore the lingo and assess the results. Most vendors offer a tool or consulting approach to help businesses size what results their product will deliver for an environment.

      System Scalability

      System scalability

      Beyond each approach’s discovery technology, organizations must also consider the system’s ability to scale. Different products support different levels of index scalability. This is important not only for system robustness (absolutely critical in any system storing one copy of data to serve numerous applications and users), but also because index scalability impacts the deduplication ratio.

      A system supporting a “block pool” of unique data only to five terabytes will need to store a duplicate string every time it crosses the 5TB boundary, while a system with a 140TB index won’t store similar data until it hits 140 terabytes. If these two systems had exactly the same deduplication effectiveness, the more scalable system would still have a deduplication ratio 28 times higher and would store 1/28th the volume of data! This is direct savings to the bottom line.

      For a deduplication product to extract duplicate data, the duplicate data must be there to find. Primary application or even archive data rarely has the same level of native data duplication as backup. Hence, one deduplication approach does not fit all. A deduplication approach which is more “weighty” in resource usage may be valuable in backup, but it may make no sense to use it on primary or archive data sets where the duplicate data simply doesn’t exist.

      Rule No. 2: Price performance is important.

      As with any data management technology, data transfer and compute speed is important. This is particularly true when the deduplication technology is “in-line.” In this case, the performance of the deduplication product must be fast enough not to throttle the backup process. Even with deduplication offerings that run “deferred”, be sure the system delivers enough performance to assure that yesterday’s backup data is stored, replicated off-site (if desired), extracted to tape (if desired) and deleted (by policy) before the next day’s backup window. The system should be able to provide sufficient performance without the need for unique, high-cost proprietary hardware.

      Decreases Data Volume Where It Runs

      Rule No. 3: Deduplication decreases data volume where it runs

      Deduplication decreases data volume where it runs but it also causes work where it operates. Deduplication products offer the ability to extract data volumes at different locations in the architecture: on the production server, on a backup media or index server, or on a specialized appliance. The selection of location depends on the value organizations want to extract, as well as the resources they are willing to use pay for it.

      If, for example, there are 500 remote branches with limited bandwidth, and an organization wants to centralize backup, a product optimized to run at the source (“source-based”) on the production servers will reduce data over the wire, creating large communication savings.

      By contrast, a 12TB data center needing to reduce data storage for rapid backup and off-site vaulting using a specialized appliance (“target-based”) will reap savings in storage as well as communications, as the data is electronically vaulted to the disaster recovery site.

      Sadly, there is still no such thing as a free lunch: finding and indexing takes system resources to operate at whatever point at which it runs. “Target-based” systems accommodate this reality by controlling their own resources in an appliance that plugs into the backup environment [such as network-attached storage (NAS) or virtual tape library (VTL)]. This may initially appear expensive but it allows organizations to transparently add deduplication value without the need to redesign their current backup architecture.

      By contrast, if production servers are already in relatively full usage, a source-based deduplication process will impact production. In another common case, operating “ISV-embedded” deduplication as a feature on traditional backup media or index server may appear transparent and inexpensive but can require a complete redesign of the backup system. The new workload caused by the deduplication process will cause any already burdened backup systems to blow out existing resources, driving the need for a new media server and rebalancing of the backup environment.

      Rule No. 4: Integration with existing tools is valuable

      There is high value in the ability to integrate with existing backup processes, management interfaces and tools. The ease of integration derives more from the approach and sophistication of a deduplication product rather than where the process operates. For example, the importance of strong tape integration for vaulting should not be overlooked. Integration with backup software also varies widely, particularly if organizations want to operate a disk-based (versus virtual tape) backup model.

      ISV-embedded deduplication clearly has strong value here. For Symantec NetBackup customers wanting to do disk-based backup, the availability of OpenStorage (OST) also offers strong management and integration possibilities across multiple complementary target appliances. Organizations can use OST with a certified target appliance to manage deduplication, replication and copy to tape functions-all through the NetBackup administration console. More information about management and integration options can be found at the SNIA Website.

      Janae Lee is Senior Vice President of Marketing at Quantum. Janae has over 30 years experience in the storage market, including nine years of focus on deduplication. Janae can be reached at janae.lee@quantum.com.

      Janae Lee
      Janae Lee
      Janae Lee is Senior Vice President of Marketing at Quantum. Janae has over 30 years experience in the storage market, including nine years of focus on deduplication.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×