[Lustre-discuss] MDT raid parameters, multiple MGSes
Thomas Roth
t.roth at gsi.de
Sat Jan 22 02:23:56 PST 2011
O.k., so the point is that the MDS writes are so small, one could never stripe such a write over multiple disks anyhow.
Very good, one point less to worry about.
Btw, files on the MDT - why does the apparent file size there sometimes reflect the size of the real file, and sometimes not?
For example, on a ldiskfs-mounted copy of our MDT, I have a directory under ROOT/ with
-rw-rw-r-- 1 935M 15. Jul 2009 09000075278027.140
-rw-rw-r-- 1 0 15. Jul 2009 09000075278027.150
As they should, both entries are 0-sized, as seen by e.g. "du". On Lustre, both files exist and both have size 935M. So for some reason,
one has a metatdata entry that appears as a huge sparse file, the other does not.
Is there a reason, or is this just an illness of our installation?
Cheers,
Thomas
On 01/21/2011 09:31 PM, Cliff White wrote:
>
>
> On Fri, Jan 21, 2011 at 3:43 AM, Thomas Roth <t.roth at gsi.de <mailto:t.roth at gsi.de>> wrote:
>
> Hi all,
>
> we have gotten new MDS hardware, and I've got two questions:
>
> What are the recommendations for the RAID configuration and formatting
> options?
> I was following the recent discussion about these aspects on an OST:
> chunk size, strip size, stride-size, stripe-width etc. in the light of
> the 1MB chunks of Lustre ... So what about the MDT? I will have a RAID
> 10 that consists of 11 RAID-1 pairs striped over. giving me roughly 3TB
> of space. What would be the correct value for <insert your favorite
> term>, the amount of data written to one disk before proceeding to the
> next disk?
>
>
> The MDS does very small random IO - inodes and directories. Afaik, the largest chunk
> of data read/written would be 4.5K -and you would see that only with large OST stripe
> counts. RAID 10 is fine. You will not
> be doing IO that spans more than one spindle, so I'm not sure if there's a real need to tune here.
> Also, the size of the data on the MDS is determined by the number of files in the
> filesystem (~4k per file is good)
> unless you are buried in petabytes 3TB is likely way oversize for an MDT.
> cliffw
>
>
>
> Secondly, it is not yet decided whether we wouldn't use this hardware to
> set up a second Lustre cluster. The manual recommends to have only one
> MGS per site, but doesn't elaborate: what would be the drawback of
> having two MGSes, two different network addresses the clients have to
> connect to to mount the Lustres?
> I know that it didn't work in Lustre 1.6.3 ;-) and there are no apparent
> issues when connecting a Lustre client to a test cluster now (version
> 1.8.4), but what about production?
>
>
> Cheers,
> Thomas
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker,
Dr. Hartmut Eickhoff
Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
More information about the lustre-discuss
mailing list