mtbf – is it really *that* hard?

Most manufacturers will list an MTBF (Mean Time Between Failure) on their products – at least in the computer hardware industry.

A typical number might be 1 000 000 hours. For those keeping track at home, that’s about 114 years! Now, since no harddrive has been running for more than a century, how could they know that?

The important thing to understand is that it is NOT a rating for when the particular drive will fail – but is a statistical representation of the reliability of the entire product line: if you have 1 000 000 hard drives running, each with an MTBF of 1 000 000 hours, then about 1 will fail every hour (I know I simplified the math there). Likewise, if you have 100 000 hard drives*, one will fail about every 10 hours.

So, the next time you’re getting ready to buy a hard drive, by all means check the MTBF – but remember that it doesn’t mean a whole heckuva lot 🙂

Calvin's dad on limits
Calvin's dad on limits

Google has an excellent article (pdf) on this, as well.

*That is a not an improbable number – I have worked with customers *managing* about 50 000 servers, each of which had at least two drives – many with more