Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Latest News
    • Storage

    How Deduplication Has Evolved to Handle the Deluge of Data

    By
    Chris Preimesberger
    -
    November 13, 2015
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      PrevNext

      1How Deduplication Has Evolved to Handle the Deluge of Data

      1 - How Deduplication Has Evolved to Handle the Deluge of Data

      Deduplication comes in many unique forms, meaning that a variety of solutions exist to aid small and midsize organizations with their backup needs.

      2Inline Deduplication

      2 - Inline Deduplication

      This is an “always on” solution that works in real time as data is being written to the system. By indiscriminately deduplicating all incoming data, this process ensures a comprehensive capture, but it isn’t intuitive—spending time deduplicating data sets with minimal duplicates is a waste of time and resources, such as random access memory (RAM).

      3Post-Process Deduplication

      3 - Post-Process Deduplication

      This method analyzes and eliminates redundant data following a full backup, which yields space savings, but also requires storage space on the disk to hold data until it is deduplicated. Since it requires space to store the full backup in the first place, it is counterintuitive for organizations seeking to reduce their need for storage space through deduplication.

      4Deduplication

      4 - Deduplication

      This method involves a separate deduplication agent for each system that needs protecting, which can be effective, but buyer beware: This method is expensive, complex and time-consuming. Some vendors leverage it as an effective solution, but the multiplication of expensive systems, software licenses and bandwidth requirements can diminish this method’s overall value.

      5Target Deduplication

      5 - Target Deduplication

      Workable in real time or post-process, backup data is deduplicated and stored to disk. In this method, the backup software acts as the data mover, so it doesn’t require a user to change backup configurations or policies—the only change required is to the destination of the backup streams. This can be an attractive feature, but while it can be effective, the data is not deduplicated until it reaches the backup appliance. This requires an extra layer of software rendered unnecessary by more recent advancements in deduplication. Post-processing is also often combined with this technology, making these systems less storage-efficient.

      6Source-Side Deduplication

      6 - Source-Side Deduplication

      Considered the next generation of deduplication technology, this method only backs up new and unique data at the source. After an initial full snapshot backup is taken and saved to a recovery point server, future backups capture only new, incremental changes to the data, which results in dramatic efficiencies in required bandwidth, storage requirements, and data protection and recovery across multiple sites. The advantage of source-side deduplication is the reduction of data sent across the network and the resulting performance gain.

      7Global, Source-Side Deduplication

      7 - Global, Source-Side Deduplication

      Global deduplication is optimized source-side deduplication. With this method, every computer, virtual machine or server across local, remote and virtual sites communicates with a recovery point server (RPS) that manages a global database index of all associated files while intuitively determining what needs to be backed up. Then, the RPS pulls only new data as required while eliminating duplicate copies. It then shares the deduplicated intelligence across all source systems. Since backup data is globally deduplicated before it is transferred to the target RPS, only changes are sent over the network, which improves performance and reduces bandwidth usage.

      8Common Misconceptions About Deduplication

      8 - Common Misconceptions About Deduplication

      These are the most common: 1) All deduplication is the same and comes standard in every backup and recovery solutions; 2) inline deduplication will slow down performance; 3) source-side deduplication consumes too much processing power on the client. All are wrong. These variations may not appear to create big differences on the surface, but they can have a significant impact on the amount of data you can back up, how much usable capacity is required, how quickly you can recover from unplanned system disruptions and your budget.

      9Misconception No. 1: All Deduplication Is the Same

      9 - Misconception No. 1: All Deduplication Is the Same

      Deduplication can mean very different things, and the efficiency of this technology greatly varies from product to product. Some perform target-side, while others perform source-side deduplication; some perform deduplication per backup job, while others perform deduplication across all storage systems. Further, many vendors offer stand-alone deduplication software, which is important to account for when developing your backup and recovery requirements.

      10Misconception No. 2: Inline Deduplication Slows Performance

      10 - Misconception No. 2: Inline Deduplication Slows Performance

      As the size of the data increases (e.g., 250Kb, 512Kb, 1024Kb), the less efficient deduplication becomes. Likewise, the more data you process, the more computational resources are required. To achieve inline deduplication that doesn’t slow down due to lack of compute resources, vendors must design their own highly sophisticated data management structure. Unbeknown to many, this technology is not simply available off-the-shelf. However, you can quickly identify a vendor’s level of data management sophistication by looking at how it supports large data sets. If the inline deduplication only supports large data sets (e.g., 512Kb or 1024Kb), it’s a good indication that it’s limited to a single backup job or storage volume.

      11Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware

      11 - Common Misconception No. 3: Global, Source-Side Deduplication Is Only for VMware

      Global deduplication refers to the process of multiple backup devices federating the data management structure for maximum deduplication efficiency. This means every computer, virtual machine or server that is backed up communicates with a backup server that manages a global database index of files on all machines, everywhere. This type requires a sophisticated workflow to optimize replication between the source client and the backup device. This is hard technology to develop, and one that not every vendor has. Knowing this, it makes sense that many people think that global, source-side deduplication is only meant for VMware—not for physical machines or other virtual systems. However, this technology does exist and can yield tremendous operational efficiencies.

      12Key Trend No. 1: Inline Deduplication

      12 - Key Trend No. 1: Inline Deduplication

      How well deduplication performs is largely based on whether it is post-processed or inline. As its name says, post-process deduplication means that incoming data is first stored to disk and the data is processed for deduplication at a later time. Alternatively, when data is processed for deduplication before being written to disk, this is called inline deduplication. Inline deduplication has the advantage of writing data to disk only once and is the preferred method of deduplication when compared to post-process deduplication, which requires extra storage space and writes to more disk.

      13Key Trend No. 2: Global, Source-Side Deduplication

      13 - Key Trend No. 2: Global, Source-Side Deduplication

      The process of source-side deduplication entails backup servers that work in conjunction with agents installed on the clients (the “data source”). The client software communicates with the backup servers to compare new blocks of data and removes redundancies before the data is transferred over the network. Without having to check for duplicate data, this form of deduplication yields dramatic savings in terms of bandwidth, required storage and corresponding costs. Global, source-side deduplication takes this process a step further by sharing all of an organization’s deduplicated data intelligence across all source systems. This is quickly replacing target deduplication as the preferred method because of its ability to back up only new and unique data at the source across a global database index of files.

      PrevNext

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.