[Lustre-discuss] MDT raid parameters, multiple MGSes

Thomas Roth t.roth at gsi.de
Sat Jan 22 02:23:56 PST 2011

O.k., so the point is that the MDS writes are so small, one could never stripe such a write over multiple disks anyhow.
Very good, one point less to worry about.

Btw, files on the MDT  - why does the apparent file size there sometimes reflect the size of the real file, and sometimes not?
For example, on a ldiskfs-mounted copy of our MDT, I have a directory under ROOT/ with

-rw-rw-r-- 1  935M 15. Jul 2009  09000075278027.140
-rw-rw-r-- 1     0 15. Jul 2009  09000075278027.150

As they should, both entries are 0-sized, as seen by e.g. "du". On Lustre, both files exist and both have size 935M. So for some reason,
one has a metatdata entry that appears as a huge sparse file, the other does not.
Is there a reason, or is this just an illness of our installation?


On 01/21/2011 09:31 PM, Cliff White wrote:
> On Fri, Jan 21, 2011 at 3:43 AM, Thomas Roth <t.roth at gsi.de <mailto:t.roth at gsi.de>> wrote:
>     Hi all,
>     we have gotten new MDS hardware, and I've got two questions:
>     What are the recommendations for the RAID configuration and formatting
>     options?
>     I was following the recent discussion about these aspects on an OST:
>     chunk size, strip size, stride-size, stripe-width etc. in the light of
>     the 1MB chunks of Lustre ... So what about the MDT? I will have a RAID
>     10 that consists of 11 RAID-1 pairs striped over. giving me roughly 3TB
>     of space. What would be the correct value for <insert your favorite
>     term>, the amount of data written to one disk before proceeding to the
>     next disk?
> The MDS does very small random IO - inodes and directories.  Afaik, the largest chunk 
> of data read/written would be 4.5K -and you would see that only with large OST stripe
> counts.   RAID 10 is fine. You will not
> be doing IO that spans more than one spindle, so I'm not sure if there's a real need to tune here.
> Also, the size of the data on the MDS is determined by the number of files in the
> filesystem (~4k per file is good) 
> unless you are buried in petabytes 3TB is likely way oversize for an MDT.
> cliffw
>     Secondly, it is not yet decided whether we wouldn't use this hardware to
>     set up a second Lustre cluster. The manual recommends to have only one
>     MGS per site, but doesn't elaborate: what would be the drawback of
>     having two MGSes, two different network addresses the clients have to
>     connect to to mount the Lustres?
>     I know that it didn't work in Lustre 1.6.3 ;-) and there are no apparent
>     issues when connecting a Lustre client to a test cluster now (version
>     1.8.4), but what about production?
>     Cheers,
>     Thomas
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Dr. Hartmut Eickhoff

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

More information about the lustre-discuss mailing list