RAID Information - Data Recovery



	RAID Information - Data Recovery This page provides information on the error recovery process for various RAID levels.

Data Recovery

Data recovery, whether it be from a read error, or from a failed drive, is similar in all RAID levels at or above 2. In general, the following procedure is used:

All data blocks in the RAID stripe are read (in the case of RAID-3, the entire RAID stripe should have been read with the original data, removing this step).
The data is XORed together (except RAID-2, where a complex algorithm is used instead). For odd parity, the buffer should be initialized to ones, while for even parity, the buffer should be initialized to zeros. An example of this process can be found here.
The result of the XOR (or complex algorithm) is the data that was requested.
If the data recovery is the result of a read error, the recovered data is rewritten to the disk. This allows the error recovery algorithms in the disk to have the 'correct' data, and to move the sectors without data loss, if necessary.

For RAID-1, one of the alternate drives is read to get the data. If the data recovery is the result of a read error, the recovered data is rewritten to the disk, as above.

For RAID-0, a failed drive, or a read error, is fatal and causes a loss of data. For this reason, RAID-0 should only be used for:

Combining other basic RAID levels into a single, larger and faster, volume. RAID-1+0 and RAID-5+0 are examples of this.
Swap or Page files.
High-speed access to temporary data files.
High-speed access to data files that are backed up elsewhere.
High-speed access to data files that can be easily recreated.

Any other use of RAID-0 should be carefully considered prior to implementation.

Data Reconstruction

When a failed disk is replaced, the data recovery procedure appropriate for the RAID level, is used to reconstruct every block on the new disk. The reconstructed blocks are written out to the replacement disk. While this process is being performed, all writes to the array should behave normally. A read from the array should perform a reconstruct, if necessary, when the address is ahead of the current reconstruction pointer. If the read is from behind the reconstruction pointer, a normal read will work properly.

It can be seen that an unrecovered read error on two disks at the same point (very unlikely), or any unrecovered read error when a disk is failed, will result in the loss of data. For this reason, it is strongly suggested that the disks that make up the array be read, in their entirety, on a regular basis. This will help to locate, and correct, media flaws before they become unrecoverable. In addition, it is suggested that the parity of an array be verified on a regular basis, if that is possible.

If you have any comments or suggestions, please E-mail webmaster@accs.com

© 2004 - Ashford Computer Consulting Service