[Lustre-discuss] Lustre-discuss Digest, Vol 66, Issue 40

Rick Friedman Rick.Friedman at terascala.com
Sat Jul 30 12:23:36 PDT 2011



*******************
Sent from my mobile
Apologies for typos

-----Original Message-----
From: lustre-discuss-request at lists.lustre.org [lustre-discuss-request at lists.lustre.org]
Received: Saturday, 30 Jul 2011, 2:00pm
To: lustre-discuss at lists.lustre.org [lustre-discuss at lists.lustre.org]
Subject: Lustre-discuss Digest, Vol 66, Issue 40



Send Lustre-discuss mailing list submissions to
	lustre-discuss at lists.lustre.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.lustre.org/mailman/listinfo/lustre-discuss
or, via email, send a message with subject or body 'help' to
	lustre-discuss-request at lists.lustre.org

You can reach the person managing the list at
	lustre-discuss-owner at lists.lustre.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Lustre-discuss digest..."


Today's Topics:

   1. Re: Line rate performance for clients (Andreas Dilger)
   2. Re: Line rate performance for clients (Brock Palen)
   3. Random OST Numbers chosen in a stripe (Roger Spellman)


----------------------------------------------------------------------

Message: 1
Date: Fri, 29 Jul 2011 12:01:40 -0600
From: Andreas Dilger <adilger at whamcloud.com>
Subject: Re: [Lustre-discuss] Line rate performance for clients
To: Brock Palen <brockp at umich.edu>
Cc: lustre-discuss discuss <lustre-discuss at lists.lustre.org>
Message-ID: <FA55B2A9-A027-4982-A3FA-4BFFA8B5E5CE at whamcloud.com>
Content-Type: text/plain; charset=us-ascii

On 2011-07-29, at 11:33 AM, Brock Palen wrote:
> I think this is a networking question.
> 
> We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex.
> 
> If I do the following:
> 
> cp /lustre/largeilfe.h5 /tmp/
> 
> I get 117MB/s
> 
> If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s
> 
> If I go directly from  /lustre -> archive  I get 50MB/s,  

Strace your globus-url-copy and see what IO size it is using.  "cp" has long ago been modified to use the blocksize reported by stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre, but much less so than from /tmp.

> this is consistently reproducible.  It doesn't mater if I just copy a large file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest data I get what looks like half duplex performance. 
> 
> Anyone have ideas why I cannot do 1Gig-e full duplex?

I don't think this has anything to do with "full duplex".  117MB/s is pretty much  the maximum line rate for GigE (and pretty good for Lustre, if I do say so myself) in one direction.  There is presumably no data moving in the other direction at that time.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.





------------------------------

Message: 2
Date: Fri, 29 Jul 2011 14:15:42 -0400
From: Brock Palen <brockp at umich.edu>
Subject: Re: [Lustre-discuss] Line rate performance for clients
To: Andreas Dilger <adilger at whamcloud.com>
Cc: lustre-discuss discuss <lustre-discuss at lists.lustre.org>
Message-ID: <78BD437E-8F53-47DF-9D87-A98849B4A92D at umich.edu>
Content-Type: text/plain; charset=us-ascii



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jul 29, 2011, at 2:01 PM, Andreas Dilger wrote:

> On 2011-07-29, at 11:33 AM, Brock Palen wrote:
>> I think this is a networking question.
>> 
>> We have lustre 1.8 clients with 1gig-e interfaces that according to ethtool are running full duplex.
>> 
>> If I do the following:
>> 
>> cp /lustre/largeilfe.h5 /tmp/
>> 
>> I get 117MB/s
>> 
>> If I then use globus-url-copy to move that file from /tmp/ to -> remove tape archive I get 117MB/s
>> 
>> If I go directly from  /lustre -> archive  I get 50MB/s,  
> 
> Strace your globus-url-copy and see what IO size it is using.  "cp" has long ago been modified to use the blocksize reported by stat(2) for copying, and Lustre reports a 2MB IO size for striped files (1MB for unstriped).  If your globus tool is using e.g. 4kB reads then it will be very inefficient for Lustre, but much less so than from /tmp.
> 
>> this is consistently reproducible.  It doesn't mater if I just copy a large file on lustre to lustre,  or scp, or globus.  If I try to ingest and outgest data I get what looks like half duplex performance. 
>> 
>> Anyone have ideas why I cannot do 1Gig-e full duplex?
> 
> I don't think this has anything to do with "full duplex".  117MB/s is pretty much  the maximum line rate for GigE (and pretty good for Lustre, if I do say so myself) in one direction.  There is presumably no data moving in the other direction at that time.

Ah I guess I wasn't clear, I only get 117MB/s when I do 'one direction on the network'  eg copy form lustre to /tmp (local drive)',   /tmp using globus out.

Its just when the client is reading form lustre and sending the data out at the same time that I only get 50MB/s.  

Does that make sense?  Is it even right for me to expect that I could combine the performance together and expect full speed in and full speed out if I can consistently get them independent of each other? 

> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 
> 
> 



------------------------------

Message: 3
Date: Fri, 29 Jul 2011 16:49:28 -0400
From: "Roger Spellman" <Roger.Spellman at terascala.com>
Subject: [Lustre-discuss] Random OST Numbers chosen in a stripe
To: <lustre-discuss at lists.lustre.org>,	<wc-discuss at whamcloud.com>
Message-ID:
	<2C7DE72B9BD00F44BAECA5B0CBB87395013598BB at hermes.terascala.com>
Content-Type: text/plain;	charset="iso-8859-1"

Suppose that I stripe a directory with the following command:

lfs setstripe  -c 4 .

On some of my systems, when I create file in the directory, the list of OSTs for a particular file is sequential, e.g.

   obdidx           objid          objid            group
    12               2            0x2                0
    13               2            0x2                0
    14               2            0x2                0
    15               2            0x2                0

On another one of my systems, when I create files in a similarly striped directory, I get seemingly random assignment, e.g.

For one file:

?? obdidx?????????? objid????????? objid??????????? group
??? 14??????????? 6884???????? 0x1ae4??????????????? 0
??? 46??????????? 6880???????? 0x1ae0??????? ????????0
???? 8??????????? 6883???????? 0x1ae3??????????????? 0
 ?? 29??????????? 6880???????? 0x1ae0??????????????? 0

For a different file:

?? obdidx?????????? objid????????? objid??????????? group
???? 13??????? ????6884???????? 0x1ae4??????????????? 0
???? 28??????????? 6880???????? 0x1ae0??????????????? 0
  ?? 44??????????? 6880???????? 0x1ae0??????????????? 0
???? 27??????????? 6880???????? 0x1ae0??????????????? 0

Why is this?  

How can I control it to always be sequential?

Thanks.

Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com <http://www.terascala.com/>


------------------------------

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


End of Lustre-discuss Digest, Vol 66, Issue 40
**********************************************


More information about the lustre-discuss mailing list