Five 'Dirty Little Secrets' to Know When Buying a Data Archive

It turns out there are several so-called dirty little secrets that not every vendor will tell you ahead of time about archiving products. There are five categories of these "secrets": scalability, data protection, performance, data migration and energy efficiency. If you're in the market, you need to read this first.

It turns out there are some so-called dirty little secrets that not every vendor will tell you about archiving products. They fall into five categories of "secrets": scalability, data protection, performance, data migration and energy efficiency.
Dirty Little Secret No. 1: Scalability. CAS(content-addressable storage) archives have a hard limit on the number of objects that can be stored.
This is a very different metric from the total amount of usable storage a system might have.
"What nobody tells you is that as you grow the number of your stored objects, you're going to run into a few challenges," said Bob Woolery, senior vice president of marketing at Nexsan, which makes SANs (storage area networks) and archiving packages. "Let's say you have 5 terabytes of space. You say, 'Great, when I run out of 5TB, I'll buy 5TB more.' And you purchase it based on that. But the other constraint is your object count.
"Why this is important is that you can grow your archive so large in terms of object count that the system will give you an 'all full up,' when you still may have plenty of capacity left," Woolery said.
So you call up your local vendor and tell him that your system thinks it is full when you still have, say, 2TB of capacity left. "That's when you find out that the object count is what really determines how much capacity you use," Woolery said.
An object limit can be reached long before the actual storage limit is reached, which means customers now have to invest in a second expensive database even though they technically still have space available.
A good example of this is e-mail. A company may archive all e-mail for compliance purposes. The vast majority of these e-mail objects may be small in size, but the sheer volume may max out the archive's object limit quickly, leaving gigabytes or terabytes of storage space unused. This is usually a big shock for companies.
Dirty Little Secret No. 2: Performance degradation. As objects pile up in an archive, the speed at which the archive runs slows down tremendously.
"What they don't want to tell you is that all of a sudden when you get near your object limit, you get this 'crawl' effect," Woolery said. "When you look under the hood of an archive, you see a single database. With the exception of [Nexsan's] Assureon, which has a dual [database], all of those systems have a single database. It can be a small or a large one, but it is still a single database."
A database simply gets filled up and overwhelmed with managing a high number of objects and all their corresponding metadata.
"Because it had to manage an ever-growing number of objects and process them, the processors within the archive end up spending so much time managing those objects that they're not able to take in as many files and push them out the door when you need them," Woolery said.
A dual-database setup alleviates this issue, he said.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor-in-Chief of eWEEK and responsible for all the publication's coverage. In his 13 years and more than 4,000 articles at eWEEK, he has distinguished himself in reporting...