Testing New Drives

To set up my network attached storage (NAS), I recently ordered two 4 TB WD Red drives. After some research (here and here) I came up with the protocol below. Note: run all commands as root and replace /dev/disk with the appropriate device name for your setup:

  1. read out S.M.A.R.T attributes: smartctl -a /dev/disk > baseline
  2. perform a conveyance test to check for damages during transport: smartctl -t conveyance /dev/disk and compare to baseline
  3. perform a short test: smartctl -t short /dev/disk and compare to baseline
  4. run badblocks on complete disk: badblocks -wsv -b 4096 -t random -o badblocks.txt /dev/disk, monitor temperature and compare to baseline. This took around 7 hours for my drives.
  5. another short test: smartctl short /dev/disk to see if errors came up
  6. finally, perform a long test: smartctl long /dev/disk and check attributes one last time. You can see the estimated runtime of the test with smartctl -c /dev/disk

After all these tests, if none of the critical attributes were affected, we can put the drive into production. If not, send it back. In my case, both drives passed the tests.

Compare to baseline

After having read the S.M.A.R.T attributes on the fresh disk in Step 1. above, and saved to a file called baseline, we can compare the attributes to this baseline after each test with the following code:

smartctl -a /dev/disk > attributes
diff baseline attributes 

Check for any differences, especially in the following attributes:

If any values in the column RAW_VALUE get/are above 0, return the disk.

Monitor temperature

Some of the tests, especially badblocks stress the drive and increase its running temperature. It is important to have enough cooling and to monitor disk temperature regularly (e.g. every 4 hours) using the following command:

smartctl -l scttemp /dev/disk

The temperature should never rise above the maximum recommended temperature indicated in the output of that command (65 degrees Celsius for my drives). If it reaches 5°C less than that maximim temperature, immediately abort the test and provide the drive with better cooling. Then resume testing.