RAID 5: A powerful technology to ensure the integrity of your data

Written By: Ontrack

Date Published: 06 November 2023 01:33:52 EST

RAID 5: A powerful technology to ensure the integrity of your data

Healthy RAID 5 Array

A RAID array is a collection of disks that are configured either by software or hardware in a certain way to protect data or enhance performance. The term RAID stands for a redundant array of independent disks. There are many different types of RAID arrays which affect read and write speeds as well as redundancy or fault tolerance.

Developed in the early 80's, RAID 5 is the most common configuration and provides a good compromise between fault tolerance and performance. A RAID 5 array requires at least three disks and offers increased read speeds but no improvements in write performance. This RAID level can tolerate one disk failure.

What does a RAID 5 configuration look like?

The RAID 5 array contains at least 3 drives and uses the concept of redundancy or parity to protect data without sacrificing performance.

Similar to a RAID 0 array which stripes data across multiple drives to improve performance, RAID 5 stripes data but adds an additional stripe of data known as parity for protection. The data contained in the parity stripe, in most RAID 5 configurations, is an XOR of the data from the other stripes. This makes RAID 5 cheaper to implement than a RAID 10 as only one drives space is allocated to parity and allows more flexibility and greater volume sizes than a RAID 1.

Healthy RAID 5 Array

In the example above, the parity found on Drive 4 for the first stripe is the XOR of the data from the stripes named Data1, Data2, and Data 3. The parity found in the second stripe on Drive 3, is the XOR of the data stripes named Data 4, Data 5 and Data 6.

What does parity do in a RAID 5 array?

Having a block of redundancy or parity as part of every data stripe allows the system to rebuild in the event one of the drives fails or goes offline. The RAID controller or RAID software can virtually rebuild any missing data segment by using the parity.

RAID 5 Array with One Failed Drive

In the example above, we see that one drive is failed. Upon losing a drive, the array will go into a degraded mode.  In degraded mode, the RAID controller will combine the data stripes with parity as needed to present good data to the operating system.  In our example, the controller will combine Data 1, Data 3 and Parity for the first stripe to replace the missing data in Data 2.  In the second stripe, Data 4, Data 2 and Parity are used to replace Data 5.  In the third stripe, no parity is needed as all of the data drives are present.

How does a Hot Spare work in a RAID 5 array?

Healthy RAID 5 Array with Hot Spare

A hot spare is an additional drive that can be added to a RAID 5 array to allow for fast recovery from a failed drive. In the above example, we see a healthy RAID 5 array with the hot spare added. Note that the hot spare does not contain any data until a failure occurs and the drive is needed.

If a hot spare is available to the system, the controller will automatically begin rebuilding the missing data from the failed drive to the hot spare in the event of a failure.

RAID 5 Array with One Failed Drive + Hot Spare

In the example above, drive 2 failed.  The system used the hot spare and rebuilt all of the missing data from Drive 2 on to the hot spare.

When a drive fails, time is of the essence in rebuilding.  Running in degraded mode puts additional stress on the remaining drives and can cause additional failures if not corrected quickly.  Having one or more hot spares available allows for quicker recovery times.

Is data recovery from RAID 5 possible?

Data recovery is possible from a failed RAID 5 array. While data recovery can be complex and challenging with a RAID 5 array, it generally ends successfully. There are several reasons for data loss and the recovery effort for each of them is different. A few examples are below:

Data Recovery with one failed drive

RAID 5 Array with One Failed Drive

If one drive fails in an array, parity can be used to rebuild the missing data. In this scenario, Ontrack is usually able to recovery 100% of the data. Upon receipt of a non-functional array, the drives from the array are imaged in the clean room. Then the array is virtually rebuilt using those images. Once the RAID is assembled, the file system or volume is scanned for corruption, virtually repaired and the data extracted. The failed drive is often not needed, as any missing data stripes can be rebuilt from parity.

Data Recovery from multiple failed drives

RAID 5 Array with Multiple Failed Drives

The process is similar to a single drive failure, for recovery from for multiple failed drives. Upon receipt of a non-functional array, the drives from the array are imaged in the clean room. It is important to get as much of each of the failed drives imaged as possible as this allows for the recovery of more data.

Then the array is virtually rebuilt using those images.  In the example, Data 2, Data 3 and Parity from stripe one is used to rebuild Data 1. Parity is not needed in the second stripe as all of the data blocks are present. In the third stripe, Data 7, Parity and Data 8 are combined to replace Data 9.

Once the RAID array is virtually reassembled, the file system or volume is scanned for corruption. In addition to file system corruption, engineers are also looking for data that is not consistent or out of date. This occurs when there is a gap of time between drive failures and one of the drives is degraded. Data recovery engineers need experience in recognising this type of damage so they can virtually repair the volume and extract good file data.

Subscribe

KLDiscovery Ontrack Limited, Nexus, 25 Farringdon Street, London, EC4A 4AB, United Kingdom (see all locations)