[lustre-discuss] problem getting high performance output to single file

Tue May 19 17:23:44 PDT 2015

David,

You are right, there is a lock. As Patrick mentioned,
https://jira.hpdd.intel.com/browse/LU-1669 will solve your problems. Please
check it out.

In my own experience, Lustre 2.7.0 client does solve such problem very
well, and I got a very good performance so far.

Regards,
Cuong

On Wed, May 20, 2015 at 4:46 AM, David A. Schneider <
davidsch at slac.stanford.edu> wrote:

> We do use checksums, but can't turn it off. It know we've measured some
> performance penalty with checksums. I'll check about configuring lustre
> clients to to use RDMA. We ran into something similar where our MPI
> programs were not taking advantage of the infini-band - we noticed much
> slower message passing then we expected - it sounds like there is a similar
> thing we can do with lustre, but I guess the locking is the main issue. All
> our compute nodes are currently running red hat 5 and it doesn't look like
> lustre 2.6 was tested with rhel5, but we have been talking about moving
> everything to at least rhel6, maybe rhel7, so there's hope, Thanks for the
> help!
>
> best,
>
> David
>
>
> On 05/19/15 11:10, Patrick Farrell wrote:
>
>> Ah.  I think I know what¹s going on here:
>>
>> In Lustre 2.x client versions prior to 2.6, only one process on a given
>> client can write to a given file at a time, regardless of how the file is
>> striped.  So if you are writing to the same file, there will be little to
>> no benefit of putting an extra process on the same node.
>>
>> A *single* process on a node could benefit, but not the split you¹ve
>> described.
>>
>> The details, which are essentially just that a pair of per-file locks are
>> used by any individual process writing to a file, are here:
>> https://jira.hpdd.intel.com/browse/LU-1669
>>
>>
>> On 5/19/15, 12:59 PM, "Mohr Jr, Richard Frank (Rick Mohr)" <rmohr at utk.edu
>> >
>> wrote:
>>
>>  On May 19, 2015, at 1:44 PM, Schneider, David A.
>>>> <davidsch at slac.stanford.edu> wrote:
>>>>
>>>> Thanks for the suggestion! When I had each rank run on a separate
>>>> compute node/host, I saw parallel performance (4 seconds for the 6GB of
>>>> writing). When I ran the MPI job on one host (the hosts have 12 cores,
>>>> by default we pack ranks onto as few hosts as possible), things happened
>>>> serially, each rank finished about 2 seconds after a different rank.
>>>>
>>> Hmm. That does seem like there is some bottleneck on the client side that
>>> is limiting the throughput from a single client.  Here are some things
>>> you could look into (although they might require more tinkering than you
>>> have permission to do):
>>>
>>> 1) Based on your output from ³lctl list_nids², it looks like you are
>>> running IP-over-IB.  Can you configure the clients to use RDMA?  (They
>>> would have nids like x.x.x.x at o2ib.)
>>>
>>> 2) Do you have the option of trying a newer client version?  Earlier
>>> lustre versions used a single-thread ptlrpcd to manage network traffic,
>>> but newer versions have a multi-threaded implementation.  You may need to
>>> compare compatibility with the Lustre version running on the servers
>>> though.
>>>
>>> 3) Do you gave checksums disabled?  Try running "lctl get_param
>>> osc.*.checksums².  If the values are ³1², then checksums are enabled
>>> which can slow down performance.  You could try setting the value to ³0²
>>> to see if that helps.
>>>
>>> --
>>> Rick Mohr
>>> Senior HPC System Administrator
>>> National Institute for Computational Sciences
>>> http://www.nics.tennessee.edu
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

-- 
Nguyen Viet Cuong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150520/60c67f03/attachment.htm>