Computers are all about information: creating it, manipulating it, retrieving it and, above all, storing it. That makes storage, and more particularly hard disks, the most critical element of any computer system, a statement that holds true whether the drives are on the corporate desktop or function as shared storage on a company network, alone or in an array. If part of your job is deciding on minimum requirements for hard disks, that’s a strong argument for paying close attention to hard disk specifications. Here’s a look at some key reliability specs to consider.
Life Expectancy
Drives, like people, have a life expectancy. For drives, it’s called the service life or design life (because that’s how long the drive was designed to remain in service). The service life is typically three to five years, but can be as high as 10 years. Knowing the service life is important, because failure rates rise rapidly at the end of service life. Assuming the drive lasts that long, you’ll want to replace it at that point-before it fails. Knowing the service life is also important for understanding the MTBF (mean time between failures) spec.
What Mean Time Between Failures Isn’t
MTBF is probably the single most widely misunderstood drive spec, even among people who are knowledgeable about computer hardware. It doesn’t tell you anything about how long a drive will last, which is what most people think it means. MTBFs for the current generation of hard disks are typically anywhere from 500,000 to 1.2 million hours for desktop drives, and as much as 1.6 million hours for enterprise drives. That works out to roughly 57 to 180 years. Drives obviously don’t last that long.
Storage Specs: Disk Drive Failure Rates
How Likely It Is to Break
The right way to read MTBF is as a statistical statement. It tells you how likely the drive is to fail, or, more precisely, how often a drive will fail, on average. Before we look at the details, though, it’s important to understand that not all MTBFs are on equal footing.
Read the Fine Print
Any given MTBF is based on specific testing conditions, including obvious issues like temperature, which can affect the life of the drive. A less obvious but nonetheless critical issue is whether the drive was tested on the assumption that it would run 24 hours a day, 7 days per week (8,760 hours per year), or run just 40 to 50 hours per week (a total of 2,400 hours per year for Seagate Barracuda desktop drives, for example).
The specs for enterprise drives destined for networks are generally based on the first scenario, which also assumes just a few hundred motor starts and stops per year. The specs for drives aimed at the desktop are generally based on the second scenario, and assume thousands of motor starts and stops per year.
Translating MTBF
Depending on the scenario and on how many drives you have, a given MTBF will translate to a different length of time on the calendar (or clock). If a given model drive has a 1.2 million hour MTBF, for example, and you have 1.2 million drives, you can expect an average of one drive to fail every hour. If you have 120 drives-a more reasonable number-you would, on average, expect one to fail every 10,000 hours. That works out to about one every 416 days for enterprise drives running 24 hours per day, or roughly one every 4.2 years for desktop drives running a total of 2,400 hours per year.
Disk Storage Service Life
Another Way to Look at MTBF
MTBF in hours isn’t as easy to understand on a gut level as a simple statement of failure rate. Some manufacturers are addressing that by adding the AFR (annualized failure rate) to their specs-the odds of a drive failing over the course of a single year.
From MTBF to AFR
If you can’t find the AFR for a given drive, you can easily calculate it from the MTBF. First divide one failure by the MTBF in hours to get failures per hour. To convert that to failures per year, multiply the result by 8,760 hours for an enterprise drive or by 2,400 hours (or whatever number of hours the MTBF is based on) for a desktop drive. To turn the result into a percentage, multiply by 100. For a 1.2 million hour MTBF for an enterprise drive, for example, the APR comes out to 0.73 percent, which means (in theory at least) that you have a 0.73 percent chance of the drive dying in any given year.
MTBF and Service Life
One important point about MTBF is that it holds true only for the service life of the drive. As already mentioned, once a drive reaches the end of its service life, the failure rate goes way up, and the MTBF is no longer meaningful. The MTBF only applies if you keep replacing individual drives at (or before) the end of their service lives-at which point the technology should be much improved, so you’ll want to move on to a new drive in any case.
Disk Storage Warranty
Finding the Service Life
The service life for a drive is typically missing from the spec sheet, but you can often find it in the drive’s manual. In most cases, you can search for the manual on the manufacturer’s Web site, where it’s usually available as a PDF file. You can then search for “service life” in the manual itself.
How Long Is the Warranty?
If you can’t find the service life for a particular model drive, and can’t get the information from the manufacturer, you might want to simply treat the length of the drive’s warranty as the service life. The cynical (some would say, conservative) view is that you should treat it as the service life in any case. After all, the length of the warranty is, by definition, how long the manufacturer is willing to bet the drive will last, regardless of how long it was designed to last.
Back of the Envelope, Please
One thing to keep in mind when comparing MTBFs between drives, even when they are based on the same scenarios, is that they are not solidly reliable numbers based on actual drive history. As a rule, they are based on some limited testing combined with actual results of similar, older models, with the numbers plugged into a mathematical model that calculates the MTBF. Given the same limited data, different mathematical models will spit out different results. It’s best to think of the spec as a back-of-the-envelope calculation: a useful indicator, as long as you don’t take it too seriously.
Small Differences Don’t Matter
The nature of the MTBF spec means that you have to take it with a large grain of salt. A two-to-one difference-600,000 hours versus 1,200,000-is probably meaningful. A 10 or 20 percent difference-800,000 hours versus 1,000,000-may not be.
And in the Real World…
Keep in mind too that even if the MTBF spec were precisely correct, it would only apply in the conditions defined by the testing scenario. As with fuel efficiency claims for cars, your mileage will probably vary. This alone may be enough to explain why some real-world studies have found much higher failure rates than MTBF specs predict. That’s an unpleasant reality, but it’s important to know.