Data deduplication promises enterprises more efficient storage capabilities, including greater energy efficiency and greener IT. Vendors such as IBM, EMC and NetApp are getting into the competitive data deduplication market, but for the time being, Data Domain leads the pack. Data Domain's OpenStorage solution is easy to deploy but could use a better Web GUI.
Data deduplication promises to use enterprise storage more
efficiently, reducing the need to buy as much media-tape or disk-and as
a result save space, power, and cooling in the data center.
Unfortunately, it is also a term that can have almost as many different
meanings as there are specific technologies applied to achieve it.
Broadly, the term applies to technologies that analyze data files,
find and remove redundant blocks of information, and engage some sort
of compression algorithm, usually g-zip or LZ. In general, files that
are edited frequently but with few changes are excellent candidates for
deduplication. For this reason, many businesses are turning to
deduplication solutions to reduce storage space requirements for backup
and archiving of corporate databases, e-mail server message stores and
virtual machine images. If your WAN pipes are saturated with such
traffic, then you definitely want to keep reading.
The data deduplication market is dominated by Data Domain,
so we're starting a series of reviews of products in the space with
that company. Other prominent players include NetApp, IBM, EMC
and Quantum. Traditionally, reviews have focused almost exclusively on
metrics involving the degree of deduplication, or the percentage of raw
disk space saved by de-duplication. Not only are other factors-such as
throughput performance and ease of installation-as important (if not
more), but measuring space savings is extremely difficult to do
accurately in a laboratory setting, i.e., without live data with
frequent small changes made by many clients at once over a period of
months or years.
We wanted to approach reviews of data deduplication gear from a
different angle. We chose to focus on ease and potential disruptiveness
of implementation, throughput performance, manageability and features
while testing in our New York City storage lab, and then interview
several Data Domain customers about their real-world experience in
order to gain insight into actual deduplication rates. Our primary goal
was to evaluate the suitability of the Data Domain solution with
respect to multi-site business continuity.
Our testing was designed to simulate a three-location company with a
data center, a regional headquarters and a branch office. The branch
office backed up locally to a DD120 with 350 GB of internal storage,
the regional to a DD510 with 1.2 TB of internal storage, and both of
those units replicated to a DD690 with two external drive enclosures
housing 10 TB of storage at the data center. Each unit was designed for
maximum redundancy with redundant power supplies, NICs, and Fibre
Channel controllers, as well as drive arrays configured for RAID 6 plus
hot spares. We did this using two separate methodologies, the first
being to use Symantec Veritas NetBackup to back up locally and then
replicate between the various Data Domain units using Data Domain's
replication technology, the second being to use Data Domain's OST
(OpenStorage) to control the whole backup and replication process from
NetBackUp. It is interesting to note that if your organization already
uses NBU, then you can keep all of your old jobs and policies and
merely redirect them from tape drives to Data Domain drives.
Deployment could not have been easier, although some aspects are
more focused on an enterprise storage skill set than on an IT
generalist skill set. Installation should be done using the CLI either
by telnet or attached KVM. I was pleased to see that at first login, I
was forced to change the default password. We applied licenses for
storage, replication and OST, then configured network, file system,
system and administrative settings. We confirmed our settings, rebooted
and then started setting up our CIFS and NFS shares.
Matthew D. Sarrel, CISSP, is a network security,product development, and technical marketingconsultant based in New York City. He is also a gamereviewer and technical writer. To read his opinions on games please browse http://games.mattsarrel.com and for more general information on Matt, please see http://www.mattsarrel.com.