[lustre-discuss] Write performance low

Sergio Rivas sergio.rivas at um.es
Mon Apr 17 01:41:54 PDT 2017


Ahhh... I see. Well, still it was worth it to give it a try.


Thank you very much for all the help (to you and Patrick)!


Sergio.




________________________________
De: Dilger, Andreas <andreas.dilger at intel.com>
Enviado: jueves, 13 de abril de 2017 2:44
Para: Sergio Rivas
Cc: Patrick Farrell; lustre-discuss at lists.lustre.org
Asunto: Re: [lustre-discuss] Write performance low

The method that the Linux kernel uses for mmap() IO makes it difficult to merge large mmap writes into a single RPC.

Also, this is not a common IO method for HPC so it hasn't had as much attention for optimization as regular read/write operations. That said, I don't _think_ there is any easy fix for mmap() performance, but I could be wrong.

Cheers, Andreas

On Apr 12, 2017, at 14:42, Sergio Rivas <sergio.rivas at um.es<mailto:sergio.rivas at um.es>> wrote:


Hi Andreas,

Thank you very much for the reply. That’d mean that this will increase the buffering for regular IO, such as operations conducted with POSIX or MPI IO, is that right?

If that’s the case, why is it different in the case of mmap?

Thank you again in advance.

Kind Regards,
Sergio.



From: Dilger, Andreas<mailto:andreas.dilger at intel.com>
Sent: Wednesday, April 12, 2017 12:39 AM
To: Sergio Rivas<mailto:sergio.rivas at um.es>
Cc: Patrick Farrell<mailto:paf at cray.com>; lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Write performance low

On Apr 8, 2017, at 03:59, Sergio Rivas <sergio.rivas at um.es<mailto:sergio.rivas at um.es>> wrote:
>
>
> Hi Patrick,
>
> Excuse me for the huge delay on replying to your e-mail. I just logged in to the web interface of my e-mail account and spotted that most of the e-mails from the Lustre mailing list ended-up on the SPAM folder. So, first, my apologies for this and thank you very much for your reply.
>
> I indeed tested back in the day to use MPI I/O for read-only and write-only, and also mmap to evaluate the same tests. I noticed, specially with mmap, that the speeds were asymmetric. "Reads" are clearly using the page cache (or some equivalent buffer in memory), providing an average of 12000MB/s of bandwidth after the first initialization. "Writes" seem to force network all the time and provide an average of around 1000MB/s, specially for mapped files over 1MB.
>
> Talking to the people that maintain our cluster, they assumed that the Lustre client is pre-configured to flush data as soon as possible to compensate for bandwidth differences and guarantee data consistency. However, my goal is actually to avoid this situation and cache as much as possible on the node for data reuse, which gives a performance boost (whether data is consistent with storage or not is another story, but for the use case we are proposing, it's not critical).

This is the case with regular buffered IO (not mmap() or O_DIRECT) that the client will typically cache writes until they make a full RPC.  You can increase the amount of dirty client cache via:

    lctl set_param osc.*.max_dirty_mb=N

temporarily, or to set this permanently use:

    lctl conf_param <fsname>.osc.max_dirty_mb=N

Cheers, Andreas

> Do you have any additional hints that could help given this information?
>
> Once again, thank you very much for your reply and excuse me for the delay.
>
> Kind Regards,
> Sergio.
>
> P.S.: By the way, I noticed that you work for Cray. We are mostly using our Cray XC40 (https://www.pdc.kth.se/resources/computers/beskow) for our research.
>
>
> De: Patrick Farrell <paf at cray.com<mailto:paf at cray.com>>
> Enviado: domingo, 29 de enero de 2017 22:47
> Para: sergio.rivas at um.es<mailto:sergio.rivas at um.es>; lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> Asunto: Re: [lustre-discuss] Write performance low
>
> Sergio,
>
> In general, Lustre always writes asynchronously, unless explicitly told to do otherwise (direct I/O or O_SYNC).  I don't have much experience with memory mapped files - it's possible that they somehow force synchronous behavior...  But my point, I suppose, is that there's probably not a tunable for you.  Try - just to see - writes to a normal file (IE, not memory mapped), so you can confirm.  Someone on list will probably have a better idea about writes to mmap'ed files and what to expect.
>
> - Patrick
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of sergio.rivas at um.es<mailto:sergio.rivas at um.es> <sergio.rivas at um.es<mailto:sergio.rivas at um.es>>
> Sent: Sunday, January 29, 2017 8:15:00 AM
> To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] Write performance low
>
>
> Good afternoon,
>
> I’m currently developing a small library that allows to easily perform an mmap that targets files in storage, and so far the results have been quite positive in local tests. However, trying out the same code on our cluster that uses Lustre as PFS, I have noticed that the reads are conveniently cached but that the writes seem to be synchronously flushed to storage, hence, decreasing the perceived performance (despite this fact being technically correct from a data-consistency point of view).
>
> My goal is to allow for the page cache of the OS to keep as much as dirty pages as possible with the purpose of aggregating write operations for free, but I’m afraid that I haven’t figured out how to solve this with Lustre after playing around with some of the settings that I found within the manual.
>
> Could you, please, point out if there is any tunable setting (e.g., under /proc) that allow me to increase the caching or avoid direct flushing?
>
> Thank you very much in advance.
>
> Kind Regards,
> Sergio.
>
> P.S.: I think my previous post didn’t go through, so I’m sending it again. Excuse me if it did!
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170417/4de3f018/attachment.htm>


More information about the lustre-discuss mailing list