Big Data Backup: 10 Questions Enterprises Should Ask About Deduplication

 
 
By Chris Preimesberger  |  Posted 2012-02-14
 
 
 

What Impact Will Deduplication Have on Backup Performance?

High performance is essential to large enterprises that need to move exponentially growing, massive data volumes to the safety of a backup environment within a finite backup window. Understanding the performance distinctions between each category of deduplication technology-particularly as they change over time-is essential for choosing the most appropriate one for the specific environment.

What Impact Will Deduplication Have on Backup Performance?

Will Deduplication Degrade Restore Performance?

Understand the time required to restore files that were backed up within the last week (the most common category of restore request). Ask the vendor if their technology keeps the last backup available for instant restore and fast tape vaulting.

Will Deduplication Degrade Restore Performance?

How Will Capacity and Performance Scale as the Environment Grows?

Calculate how much data you will be able to store on a single system with deduplication with your specific deduplication ratios, policies, data types and growth rate. Understand the implications of exceeding that capacity. For example, if exceeding capacity requires you to spread backups across additional systems, consider the costs of additional administrative complexity, capital expense and disruption to your environment.

How Will Capacity and Performance Scale as the Environment Grows?

How Efficient Is the Deduplication for Large Databases?

Ensure that your deduplication IT has been optimized to handle sub-8Kb data comparisons while maintaining performance levels. Large, mission-critical databases, such as Oracle, SAP, SQL Server and DB2 typically have data change in segments of 8KB or less. However, many deduplication features cannot deliver perform comparisons of data in less than 16KB segments without dramatically slowing the backup process.

How Efficient Is the Deduplication for Large Databases?

How Efficient is the Deduplication in Progressive Incremental Backup Environments?

Some dedupe packages are inefficient in deduplicating TSM progressive incremental backups, and backups from applications that fragment their data, such as NetWorker and HP Data Protector. Ask the vendor whether the deduplication technology is able to use the metadata from these backup applications to identify the areas of data that are likely to contain duplicate data so that they can perform a byte-level comparison of that data for optimal capacity reduction-while maintaining high performance.

How Efficient is the Deduplication in Progressive Incremental Backup Environments?

What Are Realistic Expectations for Capacity Reduction?

Rather than pushing for higher generic deduplication ratios, a more effective strategy for large enterprises is to choose a solution that guarantees the ability to move data to safety within backup windows while also providing efficient deduplication. Concurrent processing and deterministic ingest rate, deduplication and replication are key enablers to an enterprise environment.

What Are Realistic Expectations for Capacity Reduction?

Can Administrators Monitor Backup, Dedupe, Replication and Restore Enterprisewide?

A holistic view of the data-protection environment enables backup administrators to manage more data per administrator, fine-tune the backup environment for optimal utilization and efficiency, and to plan accurately for future performance and capacity requirements across the enterprise.

Can Administrators Monitor Backup, Dedupe, Replication and Restore Enterprisewide?

Can Deduplication Help Reduce Replication Bandwidth Requirements for Large Enterprise Data Volumes?

Some deduplication technologies enable companies to replicate data across a WAN more efficiently by replicating only byte-level changes, reducing WAN bandwidth requirements and improving time to safety.

Can Deduplication Help Reduce Replication Bandwidth Requirements for Large Enterprise Data Volumes?

Can IT "Tune" Deduplication to Meet Its Needs?

Enterprise data-protection environments may have data types that have special deduplication requirements. Look for solutions that enable IT to choose the datasets that they want to deduplicate by backup policy and data type, and those that automatically detect the type of data being backed up and perform. Opt for a technology that enables IT to choose the method of deduplication that is most efficient for each data type.

Can IT "Tune" Deduplication to Meet Its Needs?

How Much Experience Does the Vendor Have With Large Enterprise Backup Environments?

Enterprise data centers with massive data volumes and complex policies need a data-protection vendor with demonstrated expertise with enterprise-class backup applications, such as NetBackup, NetBackup OST and Tivoli Storage Manager. They should be prepared to provide backup assessments and guidance on how to optimize the overall backup infrastructure for maximum backup, replication and data deduplication performance in these environments.

How Much Experience Does the Vendor Have With Large Enterprise Backup Environments?

Rocket Fuel