Disk-based archiving can be the most logical step for IT professionals looking to add capacity and increase storage efficiencies, while avoiding the additional cost of purchasing more primary storage. Disk-based archiving solves today's data retention and compliance needs, while laying the foundation for upcoming requirements for data retention. Knowledge Center contributor George Crump explains the benefits of using disk-based archiving in your enterprise.
In these cost-conscious times, CIOs have increasingly been feeling the
pressure to either cut their IT budget or to keep it flat. Overall, IT budgets
are tightening across all industries lately. Some are flat as compared to last
year, and some are being reduced. If they are growing at all, it is only a
small percentage of growth.
Yet more than likely, if you are an IT decision maker, you have planned for
more primary storage capacity (or maybe even a new primary storage system
altogether). In either scenario, before you make that purchase, consider an enterprise
disk archive as your solution.
With an enterprise disk archive, you can solve today's urgent need by
freeing up a significant amount of primary storage capacity, while laying the
foundation for future requirements such as data retention and data compliance.
Over 80 percent of data becomes inactive after 90 days of creation and is never
accessed again. To translate those percentages into reality, this means that in
a 500TB data center, only 100TB of data is being accessed actively and 400TB of
data should not be on primary storage.
Saving Money with Enterprise
Disk Archives
An enterprise disk archive can store this data at a fraction of the price,
while improving long-term data reliability and data retention. It can easily
and cost-effectively be moved to an archive storage platform that will save you
the cost of expanding your primary storage platform—and will cost you
substantially less. It will provide easy access, high availability and
substantially improved data protection, and enable you to quickly discover it
when you need to for legal or business reasons.
Also, an archive, especially a disk-based one, does not have to be limited
to just old files. It can also be used for extra copies of files that perhaps
should never have been on primary storage. For example, database environments
always seem to have redundant copies of itself—extra backups, archives, dumps
or just straight copies. Many times, these files were created "just in
case," but never seem to be cleaned up. Now they can be safely archived to
a far more cost-effective and efficient platform.
Investing in Data Retention
Looking at this from a cost basis, primary disk—despite decreases in disk
cost and increases in disk capacities from Tier 1 suppliers—still costs about
$30 to $40 per GB once you factor in the controller, software and maintenance.
It is not uncommon for disk-based archiving solutions to be dramatically less
than that and even approaching the cost of tape. This is not simply an
investment in a cheaper platform; this is a platform that almost every data
center will need as increasing attention is being paid toward data
retention.
This primary storage cost savings solution then begins to establish a
strategy for a medium-term data initiative for most organizations, namely data
retention. Most of the data that will be archived to free up primary storage is
the very same data that will need to be retained for legal and compliance
reasons. Some data may need to be held onto for over 50 years. The challenge is
you don't know when in the 50 years it will be needed again. But when it is
needed, it will be needed rather quickly (a couple of days in the event of
legal action). This means it will need to be searchable and on disk in a common
data format.
Identifying the Archive Target
The first step is to select the archive target. While conventional wisdom
says to identify the data first, it makes more sense to identify the archive
target because that will indicate how aggressive you can be with migration. For
example, if you choose to archive to optical or tape, you cannot be as
aggressive with the data that you archive because of fear of slow recovery
requests.
If you choose to use a simple shelf of Serial ATA (SATA) drives as an
extension to your existing array, you typically will be limited by cost but,
more importantly, by scale. Most of these systems are limited by the capacity
of the shelf, and they don't have the archive-specific features that are
required for long-term retention of data (such as data integrity checking and
WORM file systems).
Using Purpose-Built Disk Archives
A purpose-built disk archive is ideal for this role. First, it provides the
ability to scale both from a capacity perspective as well as from a
generational perspective. Capacity scaling can be done by adding nodes and
growing the archive to multiple petabytes as needed. Generational scaling is
the ability to perform a rolling upgrade of the technology as it ages. Old
modules can migrate to new ones, and the old ones can be expired
seamlessly.
Disk also allows a presentation via standard network mount points such as
Common Internet File System (CIFS) and Network File System (NFS).
Although some disk archives have a proprietary API
set for access, standard network access is by far the most advantageous. While
there is no guarantee that NFS and CIFS will
still be around 50 years from now, if you compare the past, you can see that
the odds favor it. Look back 10 years. You will have a much better chance of
accessing a CIFS-based Windows 95 system over your network than you will
finding a drive to mount a 10-year-old piece of media.
Benefiting from Disk-Based Archive Systems
Once the target is settled on, the type of data to store on the archive can
be evaluated, and how that data should be moved to that system can be examined.
This is another area where disk-based archive systems shine. If they are a
network mount point, then any application that can mount a network file system
can take advantage of it. This means a simple file system move command will
work. While you may want something more sophisticated than a manual move
command, it does work. Move all the files that have not been accessed in the
last year or more. Then tell your users if their data is not on the home drive,
then it is on the "archive" drive.
Where that archive drive actually exists is the network mount point of the
disk-based archive system. While manual, and requiring user intervention, this
process is simple, adds no additional software to the cost of the archive, and
is extremely reliable. As one CIO I spoke
with put it: “For the cost we save in software, and the odds of a user actually
needing one of these files, we could have our help desk individually walk the
user through the rare access from the archive.”
If something more sophisticated is warranted, then disk-based archives also
make excellent endpoints in a tiered storage strategy. Archiving software has
caught up with the simplicities that enterprise disk archiving delivers. With a
disk archive, files just need to move from "A" to "B" and
have a transparent link set up between those files. The archive itself handles
much of the sophisticated retention. That being said, if there is a data
movement application in the environment, in most cases it can be redeployed
with great success, since even legacy applications tend to support disk as a
target.
George Crump is the founder of Storage Switzerland,
an analyst firm focused on the virtualization and storage marketplaces.
An industry veteran of over 25 years, he has held engineering and sales
positions at various IT industry manufacturers and integrators. Prior
to founding Storage Switzerland, George was CTO at one of the nation's
largest integrators. He can be reached at georgeacrump@mac.com.