What is data deduplication? What are its benefits? In simplified terms, data deduplication means comparing objects (usually files or blocks) and removing all non-unique objects (that is, copies). The basic benefits of data deduplication can be summarized as follows: reduced hardware costs, reduced data center footprint, reduced backup costs, reduced costs for disaster recovery, and increased efficiency use of storage.
If you look at the left side of the figure below, you will see several blocks being stored that are not unique. The data deduplication process removes any blocks that are not unique, resulting in the smaller group of blocks to the right.
You can apply data deduplication in multiple places. Wherever you apply it, data deduplication can affect costs not only for your Storage Area Network (SAN), but also for your entire IT infrastructure.
Based on an enterprise environment running typical applications, you probably could squeeze out between 10 to 20 percent more storage space just by getting rid of duplicate and unnecessary files. Files are commonly known as "unstructured data" and the data residing in databases is commonly known as "structured data." Simple unstructured data in files can therefore be deduplicated at the file system level, but the structured data residing in large databases is typically deduplicated underneath the actual operating system's file system at the block level.
Interestingly, though, since block-level deduplication does not need to understand the file system, it is sometimes even more efficient to deduplicate files at the block level. Whether you choose a solution that works at the block level, file level or both, you will find that it can pay for itself extremely fast in the amount of savings you get from storage, media, power, cooling and floor space costs.