[Lustre-discuss] software RAID1 in RHEL5

Kevin Van Maren kevin.van.maren at oracle.com
Thu May 19 09:44:48 PDT 2011


Adesanya, Adeyemi wrote:
> I'm discussing the proposed architecture for two new Lustre 1.8.x filesystems. We plan to use a failover pair of MDS nodes (active-active), with each MDS serving an MDT.  The MDTs will be housed in external storage but we would like to implement redundancy across more than one storage array by using software RAID1.
>
> The Lustre documentation mentions using linux md to set up software RAID1 or RAID10 for MDTs. Does the RAID1 implementation in the Lustre 1.8.x RHEL5 kernel do an adequate job of ensuring consistency across mirrored devices (compared to a hardware RAID1 implementation)?
>   

Adequate, probably.  As correct as hardware raid, doubtful.  Without 
special hardware, or doing things that kill performance, there will 
always remain some corner cases.

The issue is what happens for writes that are in process when you have a 
crash/reboot/power loss: it is possible for them to make it to one disk, 
but not the other.  So it is possible to believe they are on disk, and 
proceed accordingly, when they are only on one copy, and are lost if 
that disk fails.  Even worse, Linux alternates reads, so in theory it 
could be there one time and gone the next.

The good news is that writes should(!) not be marked as "on disk" until 
both disks have said it is written.  So you could do an md "check", and 
if needed do a "repair" before eg, replaying the journal (mounting the 
file system doing fsck, etc).  Even if the MD resync takes the older 
copy and undoes a write, it should not have been a write that was 
expected to have made it to stable storage, so the normal Lustre 
recovery mechanisms should be able to replay it.  Assuming, that is, 
that this is done _before_ you mount the device.

Kevin




More information about the lustre-discuss mailing list