12 years of HDD analysis brings insight to the bathtub curve’s reliability
1 day ago / Read about 12 minute
Source:ArsTechnica
Backup firm brings a unique, informed perspective to HDD failure rates.


Credit: Thomas Trutschel/Photothek via Getty Images

Backblaze is a backup and cloud storage company that has been tracking the annualized failure rates (AFRs) of the hard drives in its datacenter since 2013. As you can imagine, that’s netted the firm a lot of data. And that data has led the company to conclude that HDDs “are lasting longer” and showing fewer errors.

That conclusion came from a blog post this week by Stephanie Doyle, Backblaze’s writer and blog operations specialist, and Pat Patterson, Backblaze’s chief technical evangelist. The authors compared the AFRs for the approximately 317,230 drives in Backblaze’s datacenter to the AFRs the company recorded when examining the 21,195 drives it had in 2013 and 206,928 drives in 2021. Doyle and Patterson said they identified “a pretty solid deviation in both age of drive failure and the high point of AFR from the last two times we’ve run the analyses.”


Credit: Backblaze

As Doyle and Patterson wrote, the tested drives’ high failure percentage peaks this year were 4.25 percent at 10 years and three months, compared to 13.73 percent at about three years and three months in 2013 and 14.24 percent at seven years and nine months in 2021.

“Not only is that a significant improvement in drive longevity, it’s also the first time we’ve seen the peak drive failure rate at the hairy end of the drive curve. And, it’s about a third of each of the other failure peaks,” Doyle and Patterson wrote.

You can check out Paterson and Doyle’s August blog post for more information about the drives they analyzed this year. The drives were from HGST, Seagate, Toshiba, and WDC, and they had an average age of 3.7 months to 103.9 months (about 8.7 years). The drives ranged from 4TB to 24TB. In 2021, Backblaze’s sample had drives from the same vendors, and the drives tested for each model had an average age of 3.57 to 80.85 months (about 6.7 years). The drives ranged from 4TB to 16TB.

As Backblaze has done in the past, Doyle and Paterson compared the behaviors of Backblaze’s datacenter HDDs with the bathtub curve, an engineering principle that says component failure rates tend to follow a U-shape over time, with more failures occurring early in life before the rate drops, settles, and then picks up again as the component ages.

But as seen in Backblaze’s graph above, the company’s HDDs aren’t adhering to that principle. The blog’s authors noted that in 2021 and 2025, Backblaze’s drives had a “pretty even failure rate through the significant majority of the drives’ lives, then a fairly steep spike once we get into drive failure territory.”

The blog continues:

What does that mean? Well, drives are getting better, and lasting longer. And, given that our trendlines are about the same shape from 2021 to 2025, we should likely check back in when 2029 rolls around to see if our failure peak has pushed out even further.

Speaking with Ars Technica, Doyle said that Backblaze’s analysis is good news for individuals shopping for larger hard drives because the devices are “going to last longer.”

She added:

In many ways, you can think of a datacenter’s use of hard drives as the ultimate test for a hard drive—you’re keeping a hard drive on and spinning for the max amount of hours, and often the amount of times you read/write files is well over what you’d ever see as a consumer. Industry trend-wise, drives are getting bigger, which means that oftentimes, folks are buying fewer of them. Reporting on how these drives perform in a data center environment, then, can give you more confidence that whatever drive you’re buying is a good investment.

The longevity of HDDs is also another reason for shoppers to still consider HDDs over faster, more expensive SSDs.

“It’s a good idea to decide how justified the improvement in latency is,” Doyle said.

Questioning the bathtub curve

Doyle and Paterson aren’t looking to toss the bathtub curve out with the bathwater. They’re not suggesting that the bathtub curve doesn’t apply to HDDs, but rather that it overlooks additional factors affecting HDD failure rates, including “workload, manufacturing variation, firmware updates, and operational churn.” The principle also makes the assumptions that, per the authors:

  • Devices are identical and operate under the same conditions
  • Failures happen independently, driven mostly by time
  • The environment stays constant across a product’s life

While these conditions can largely be met in datacenter environments, “conditions can’t ever be perfect,” Doyle and Patterson noted. When considering an HDD’s failure rates over time, it’s wise to consider both the bathtub curve and how you use the component.