[Lustre-discuss] Lustre-discuss Digest, Vol 60, Issue 25

Buday, Tomas tomas.buday at hp.com
Sat Jan 22 12:29:21 PST 2011

"lustre-discuss-request at lists.lustre.org" <lustre-discuss-request at lists.lustre.org> wrote:

Send Lustre-discuss mailing list submissions to
        lustre-discuss at lists.lustre.org

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
        lustre-discuss-request at lists.lustre.org

You can reach the person managing the list at
        lustre-discuss-owner at lists.lustre.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Lustre-discuss digest..."

Today's Topics:

   1. Re: MDT raid parameters, multiple MGSes (Thomas Roth)
   2. Re: split OSTs from single OSS in 2 networks (Aur?lien Degr?mont)


Message: 1
Date: Sat, 22 Jan 2011 11:23:56 +0100
From: Thomas Roth <t.roth at gsi.de>
Subject: Re: [Lustre-discuss] MDT raid parameters, multiple MGSes
To: Cliff White <cliffw at whamcloud.com>
Cc: lustre-discuss at lists.lustre.org
Message-ID: <4D3AB03C.5090005 at gsi.de>
Content-Type: text/plain; charset=ISO-8859-1

O.k., so the point is that the MDS writes are so small, one could never stripe such a write over multiple disks anyhow.
Very good, one point less to worry about.

Btw, files on the MDT  - why does the apparent file size there sometimes reflect the size of the real file, and sometimes not?
For example, on a ldiskfs-mounted copy of our MDT, I have a directory under ROOT/ with

-rw-rw-r-- 1  935M 15. Jul 2009  09000075278027.140
-rw-rw-r-- 1     0 15. Jul 2009  09000075278027.150

As they should, both entries are 0-sized, as seen by e.g. "du". On Lustre, both files exist and both have size 935M. So for some reason,
one has a metatdata entry that appears as a huge sparse file, the other does not.
Is there a reason, or is this just an illness of our installation?


On 01/21/2011 09:31 PM, Cliff White wrote:
> On Fri, Jan 21, 2011 at 3:43 AM, Thomas Roth <t.roth at gsi.de <mailto:t.roth at gsi.de>> wrote:
>     Hi all,
>     we have gotten new MDS hardware, and I've got two questions:
>     What are the recommendations for the RAID configuration and formatting
>     options?
>     I was following the recent discussion about these aspects on an OST:
>     chunk size, strip size, stride-size, stripe-width etc. in the light of
>     the 1MB chunks of Lustre ... So what about the MDT? I will have a RAID
>     10 that consists of 11 RAID-1 pairs striped over. giving me roughly 3TB
>     of space. What would be the correct value for <insert your favorite
>     term>, the amount of data written to one disk before proceeding to the
>     next disk?
> The MDS does very small random IO - inodes and directories.  Afaik, the largest chunk
> of data read/written would be 4.5K -and you would see that only with large OST stripe
> counts.   RAID 10 is fine. You will not
> be doing IO that spans more than one spindle, so I'm not sure if there's a real need to tune here.
> Also, the size of the data on the MDS is determined by the number of files in the
> filesystem (~4k per file is good)
> unless you are buried in petabytes 3TB is likely way oversize for an MDT.
> cliffw
>     Secondly, it is not yet decided whether we wouldn't use this hardware to
>     set up a second Lustre cluster. The manual recommends to have only one
>     MGS per site, but doesn't elaborate: what would be the drawback of
>     having two MGSes, two different network addresses the clients have to
>     connect to to mount the Lustres?
>     I know that it didn't work in Lustre 1.6.3 ;-) and there are no apparent
>     issues when connecting a Lustre client to a test cluster now (version
>     1.8.4), but what about production?
>     Cheers,
>     Thomas
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum f?r Schwerionenforschung GmbH
Planckstra?e 1
64291 Darmstadt

Gesellschaft mit beschr?nkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker,
Dr. Hartmut Eickhoff

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt


Message: 2
Date: Sat, 22 Jan 2011 16:58:04 +0100
From: Aur?lien Degr?mont  <aurelien.degremont at cea.fr>
Subject: Re: [Lustre-discuss] split OSTs from single OSS in 2 networks
To: Haisong Cai <cai at sdsc.edu>
Cc: lustre-discuss at lists.lustre.org
Message-ID: <4D3AFE8C.2090309 at cea.fr>
Content-Type: text/plain; charset=UTF-8; format=flowed

 From Bugzilla, the patch has been introduced in Lustre 2.0 and bugfixed
in 2.1.
There is a backport for 1.8 but this was never landed in official source
So it is not available in any 1.8 official releases.
Either you have to use Lustre 2.0 or patch 1.8.5



Le 21/01/2011 18:32, Haisong Cai a ?crit :
> It does look exactly what I need. Thanks Aurllien.
>  From bugzilla, the patch has been checked in to 1.8.  Could someone please
> point me to the source location? Is it in 1.8.5 - don't believe so,
> but I thought I would check. If not, would it be in later release of 1.8?
> thanks all,
> Haisong
> On Thu, 20 Jan 2011, DEGREMONT Aurelien wrote:
>> Hello
>> If you want to register different interfaces for different OST on the
>> same OSS, you should use --network options, introduced in patch
>> https://bugzilla.lustre.org/show_bug.cgi?id=22078
>> Regards,
>> Aur?lien
>> Haisong Cai a ?crit :
>>> I have a storage server that has two QPI contolllers,
>>> each controller contols half of the I/O slots. In the first half is
>>> a raid controller and 10GbE card, in the second half is a raid controller
>>> and a 10GbE card.
>>> 4 raid arrays for 4 OSTs of single Lustre filesystem.
>>> 2 10GbE, each configured with their own IP and switches,
>>> I want to register OSTs on this OSS with different addresses.
>>> That is, OST1&  OST2 in IPSUBNET1, OST3&  OST4 in IPSUBNET2,
>>> and using "policy-based" routing directing traffic.
>>> The idea is to not do the bonding and get as much of bandwidth
>>> out of 10GbE NICs as possible.
>>> Is it possible?


Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org

End of Lustre-discuss Digest, Vol 60, Issue 25

More information about the lustre-discuss mailing list