Testing New Drives
To set up my network attached storage (NAS), I recently ordered two 4 TB WD Red
drives. After some research (here and here) I came up with the protocol below. Note: run all commands as root and replace
/dev/disk with the appropriate device name for your setup:
- read out S.M.A.R.T attributes:
smartctl -a /dev/disk > baseline
- perform a conveyance test to check for damages during transport:
smartctl -t conveyance /dev/diskand compare to baseline
- perform a short test:
smartctl -t short /dev/diskand compare to baseline
- run badblocks on complete disk:
badblocks -wsv -b 4096 -t random -o badblocks.txt /dev/disk, monitor temperature and compare to baseline. This took around 7 hours for my drives.
- another short test:
smartctl short /dev/diskto see if errors came up
- finally, perform a long test:
smartctl long /dev/diskand check attributes one last time. You can see the estimated runtime of the test with
smartctl -c /dev/disk
After all these tests, if none of the critical attributes were affected, we can put the drive into production. If not, send it back. In my case, both drives passed the tests.
After having read the S.M.A.R.T attributes on the fresh disk in Step 1. above, and saved
to a file called
baseline, we can compare the attributes to this baseline after
each test with the following code:
smartctl -a /dev/disk > attributes diff baseline attributes
Check for any differences, especially in the following attributes:
If any values in the column RAW_VALUE get/are above 0, return the disk.
Some of the tests, especially
badblocks stress the drive and increase
its running temperature. It is important to have enough cooling and to monitor
disk temperature regularly (e.g. every 4 hours) using the following command:
smartctl -l scttemp /dev/disk
The temperature should never rise above the maximum recommended temperature indicated in the output of that command (65 degrees Celsius for my drives). If it reaches 5°C less than that maximim temperature, immediately abort the test and provide the drive with better cooling. Then resume testing.