How to Reduce IT Budget Costs Using Disk-Based Archiving

Disk-based archiving can be the most logical step for IT professionals looking to add capacity and increase storage efficiencies, while avoiding the additional cost of purchasing more primary storage. Disk-based archiving solves today's data retention and compliance needs, while laying the foundation for upcoming requirements for data retention. Knowledge Center contributor George Crump explains the benefits of using disk-based archiving in your enterprise.


In these cost-conscious times, CIOs have increasingly been feeling the pressure to either cut their IT budget or to keep it flat. Overall, IT budgets are tightening across all industries lately. Some are flat as compared to last year, and some are being reduced. If they are growing at all, it is only a small percentage of growth.

Yet more than likely, if you are an IT decision maker, you have planned for more primary storage capacity (or maybe even a new primary storage system altogether). In either scenario, before you make that purchase, consider an enterprise disk archive as your solution.

With an enterprise disk archive, you can solve today's urgent need by freeing up a significant amount of primary storage capacity, while laying the foundation for future requirements such as data retention and data compliance. Over 80 percent of data becomes inactive after 90 days of creation and is never accessed again. To translate those percentages into reality, this means that in a 500TB data center, only 100TB of data is being accessed actively and 400TB of data should not be on primary storage.

Saving Money with EnterpriseDisk Archives

An enterprise disk archive can store this data at a fraction of the price, while improving long-term data reliability and data retention. It can easily and cost-effectively be moved to an archive storage platform that will save you the cost of expanding your primary storage platform-and will cost you substantially less. It will provide easy access, high availability and substantially improved data protection, and enable you to quickly discover it when you need to for legal or business reasons.

Also, an archive, especially a disk-based one, does not have to be limited to just old files. It can also be used for extra copies of files that perhaps should never have been on primary storage. For example, database environments always seem to have redundant copies of itself-extra backups, archives, dumps or just straight copies. Many times, these files were created "just in case," but never seem to be cleaned up. Now they can be safely archived to a far more cost-effective and efficient platform.

Investing in Data Retention

Looking at this from a cost basis, primary disk-despite decreases in disk cost and increases in disk capacities from Tier 1 suppliers-still costs about $30 to $40 per GB once you factor in the controller, software and maintenance. It is not uncommon for disk-based archiving solutions to be dramatically less than that and even approaching the cost of tape. This is not simply an investment in a cheaper platform; this is a platform that almost every data center will need as increasing attention is being paid toward data retention.

This primary storage cost savings solution then begins to establish a strategy for a medium-term data initiative for most organizations, namely data retention. Most of the data that will be archived to free up primary storage is the very same data that will need to be retained for legal and compliance reasons. Some data may need to be held onto for over 50 years. The challenge is you don't know when in the 50 years it will be needed again. But when it is needed, it will be needed rather quickly (a couple of days in the event of legal action). This means it will need to be searchable and on disk in a common data format.

Identifying the Archive Target

The first step is to select the archive target. While conventional wisdom says to identify the data first, it makes more sense to identify the archive target because that will indicate how aggressive you can be with migration. For example, if you choose to archive to optical or tape, you cannot be as aggressive with the data that you archive because of fear of slow recovery requests.

If you choose to use a simple shelf of Serial ATA (SATA) drives as an extension to your existing array, you typically will be limited by cost but, more importantly, by scale. Most of these systems are limited by the capacity of the shelf, and they don't have the archive-specific features that are required for long-term retention of data (such as data integrity checking and WORM file systems).

Using Purpose-Built Disk Archives

A purpose-built disk archive is ideal for this role. First, it provides the ability to scale both from a capacity perspective as well as from a generational perspective. Capacity scaling can be done by adding nodes and growing the archive to multiple petabytes as needed. Generational scaling is the ability to perform a rolling upgrade of the technology as it ages. Old modules can migrate to new ones, and the old ones can be expired seamlessly.

Disk also allows a presentation via standard network mount points such as Common Internet File System (CIFS) and Network File System (NFS). Although some disk archives have a proprietary API set for access, standard network access is by far the most advantageous. While there is no guarantee that NFS and CIFS will still be around 50 years from now, if you compare the past, you can see that the odds favor it. Look back 10 years. You will have a much better chance of accessing a CIFS-based Windows 95 system over your network than you will finding a drive to mount a 10-year-old piece of media.

Benefiting from Disk-Based Archive Systems

Once the target is settled on, the type of data to store on the archive can be evaluated, and how that data should be moved to that system can be examined. This is another area where disk-based archive systems shine. If they are a network mount point, then any application that can mount a network file system can take advantage of it. This means a simple file system move command will work. While you may want something more sophisticated than a manual move command, it does work. Move all the files that have not been accessed in the last year or more. Then tell your users if their data is not on the home drive, then it is on the "archive" drive.

Where that archive drive actually exists is the network mount point of the disk-based archive system. While manual, and requiring user intervention, this process is simple, adds no additional software to the cost of the archive, and is extremely reliable. As one CIO I spoke with put it: "For the cost we save in software, and the odds of a user actually needing one of these files, we could have our help desk individually walk the user through the rare access from the archive."

If something more sophisticated is warranted, then disk-based archives also make excellent endpoints in a tiered storage strategy. Archiving software has caught up with the simplicities that enterprise disk archiving delivers. With a disk archive, files just need to move from "A" to "B" and have a transparent link set up between those files. The archive itself handles much of the sophisticated retention. That being said, if there is a data movement application in the environment, in most cases it can be redeployed with great success, since even legacy applications tend to support disk as a target.

/images/stories/heads/crump_george70x70.jpg George Crump is the founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. An industry veteran of over 25 years, he has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to founding Storage Switzerland, George was CTO at one of the nation's largest integrators. He can be reached at