[Lustre-discuss] MDS disk recmendations

Brock Palen brockp at umich.edu
Mon Jun 16 11:50:41 PDT 2008


On Jun 16, 2008, at 2:31 PM, Andreas Dilger wrote:
> On Jun 16, 2008  14:00 -0400, Brock Palen wrote:
>> This paper does not talk about the MDS though.  We have a sun 2540
>> with 12 15K 300 GB drives.
>>
>> We plan to use 4 drives  in a 1+0  with the rest being spares.  What
>> I am curious about are the following options
>>
>> Stripe size,
>> Readahead on the MDS Raid
>
> There is a discussion about MDS + RAID in the Lustre Manual,  
> section 10.

I read to fast and make mistakes thank you!
>
> When formatting a filesystem on a RAID device, it is beneficial to  
> specify
> additional parameters at the time of formatting. This ensures that the
> filesystem is optimized for the underlying disk geometry. Use the
> --mkfsoptions parameter to specify these options in the Lustre  
> configuration.
>
> For RAID5, RAID6, RAID1+0 storage, specifying the -E stride= 
> {stride_size}
> option improves the layout of the filesystem metadata ensuring that  
> no single
> disk contains all of the allocation bitmaps. The stride_size  
> parameter is in
> units of 4096-byte blocks and represents the amount of contiguous  
> data written
> to a single disk before moving to the next disk. This is applicable  
> to both
> MDS and OST filesystems.

Good to point that out.

>
> Note - It is better to have the MDS on RAID1+0 than on RAID5 or RAID6.
>
> RAID1 with an internal journal and two disks from different  
> controllers.
> If you need a larger MDT, create multiple RAID1 devices from pairs of
> disks, and then make a RAID0 array of the RAID1 devices.  This ensures
> maximum reliability because multiple disk failures only have a small
> chance of hitting both disks in the same RAID1 device.

We are going to have several unused disks in the MGS/MDS array.   
Would it be helpful for the MDS (less the MGS I would think) to use  
and external journal on a pair of disks in raid 1?  I am not sure it  
would help that much, but I could be wrong.

>
> Doing the opposite (RAID1 of a pair of RAID0 devices) has a 50% chance
> that even two disk failures can cause the loss of the whole MDT  
> device.
> The first failure will disable an entire half of the mirror and the
> second failure has a 50% chance of disabling the remaining mirror.
>
>> I did not find anything in the manual about this, other than disable
>> readahead on DDN hardware but that sounded like OST's  not MDS.
>
> Readahead will not have much benefit for the MDT, because most of the
> IO is random.  The chunksize for RAID1 is mostly meaningless.

Ok,  did't know that I will look into that more as to why.

>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
>




More information about the lustre-discuss mailing list