Google Studies Hard Drive Failure

A hospital posted a notice in the nurses’ station saying: “Remember, the first five minutes of a human being’s life are the most dangerous.”
Underneath, a nurse had written: “The last five are pretty risky, too….”

Once you have seen the birth of your own child you quickly understand how risky the first few minutes can be. I saw my daughter come out during the C-Section and watched as they quickly whisked her away to another room before I got a good chance to see her a few minutes later.

Not to make light of the risks of childbirth, but the same is true for hard drives. This week I installed a stack of expensive 147GB Seagate Cheetah SCSI drives into two Raid-5s. 8 drives fired up nicely, 1 conked out almost immediately (the other spare will be tested this week). The fear I always have with these is that we take several identical drives and run them in nearly identical circumstances so that we risk them all dieing around the same time (just as Brett Anderson).

While I, and most likely you as well, can only judge hard drives based upon a very limited sample; Google Inc. has studied the lifespans of more than 100,000 drives in their own controlled environment over the last 5 years. The result: Failure Trends in a Large Disk Drive Population (via TGDaily).

Heat and usage are factors. If you run a hard drive in a cool environment for a week then store it away properly for 7 years there is a good chance it will still run nicely. However, they don’t seem to be the primary indicators of hard disk failure which seems to be more idiopathic than than anything else.

After studying the information on environmental variables and SMART (Self-Monitoring Analysis and Reporting Technology) status they found that drive make and vintage are key contributors to failure, and surface errors noted by SMART are important signs of subsequent failure.

Now for something sort of on this same topic, but not quite exactly. For our important machines we run Raid 5 and keep external backups, but occasionally a small system may go down with some useful data that never made it to the network. This is where something like SpinRite comes in handy. Hopefully it can rebuild some of the data enough to let you get it back.

Update: StorageMojo has some good comments on this.

1 Response to “Google Studies Hard Drive Failure”


  1. 1 Brett Anderson Feb 17th, 2007 at 10:13 am

    I replaced the 2 failed drives with 146GB Seagate Cheetah SCSI drives too. So far so good, but I have a spare just in case ;)

Leave a Reply