[lustre-discuss] how to optimize write performances

Tue Oct 5 02:56:25 PDT 2021

Hello

Direct I/O is impacting the whole I/O path, from client down to ZFS. Agreed ZFS does not support it, but all the rest of I/O path is.

Could you provide you fio command line?
As I said, you need to do _large I/O_ of multiple MB size. If you are just doing 1 MB I/O (assuming stripesize is 1MB), you application will just send 1 RPC at a time to 1 OST, wait for the reply and send the next one. The client cache will help at the beginning, until it is full (32MB max_dirty_mb per OST by default). 
What about rpc_stats?

Aurélien

Le 04/10/2021 18:32, « Riccardo Veraldi » <riccardo.veraldi at cnaf.infn.it> a écrit :

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

    Hello Aurelien,

    I Am using ZFS as lustre backend. ZFS does not support direct I/O.

    Ok Lustre does but anyway the performance with direct I/O are worse when
    using ZFS backend at least during my tests.

    Best

    Riccardo

    On 10/1/21 2:22 AM, Degremont, Aurelien wrote:
    > Hello
    >
    > To achieve higher throughput with a single threaded process, you should try to limit latencies and parallelize under the hood.
    > Try checking the following parameters:
    > - Stripe your file across multiple OSTs
    > - Do large I/O, multiple MB per write, to let Lustre send multiple RPC to different OSTs
    > - Try testing with and without Direct I/O.
    >
    > What is your 'dd' test command?
    > Clear and check rpc stats (sudo lctl set_param osc.*.rpc_stats=clear; sudo lctl get_param osc.*.rpc_stats). Check you are sending large RPCs (pages per rpc).
    >
    > Aurélien
    >
    > Le 30/09/2021 18:11, « lustre-discuss au nom de Riccardo Veraldi » <lustre-discuss-bounces at lists.lustre.org au nom de riccardo.veraldi at cnaf.infn.it> a écrit :
    >
    >      CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >
    >
    >
    >      Hello,
    >
    >      I wanted to ask some hint on how I may increase single process
    >      sequential write performance on Lustre.
    >
    >      I am using Lustre 2.12.7 on RHEL 7.9
    >
    >      I have a number of OSSes with SAS SSDs in raidz. 3 OST per oss and each
    >      OST is made by 8 SSD in raidz.
    >
    >      On a local test with multiple writes I can write and read from the zpool
    >      at 7GB/s per OSS.
    >
    >      With Lustre/ZFS backend I can reach peak writes of 5.5GB/s per OSS which
    >      is ok.
    >
    >      This anyway happens only with several multiple writes at once on the
    >      filesystem.
    >
    >      A single write cannot perform more than 800MB-1GB/s
    >
    >      Changing the underlying hardware and moving to MVMe slightly improve
    >      single write performance but just slightly.
    >
    >      What is preventing a single write pattern to perform better ? They are
    >      XTC files.
    >
    >      Each single SSD has a 500MB/s write capability by factory specs. So
    >      seems like that with a single write it is not possible to take advantage
    >      of the
    >
    >      zpool parallelism. I tried also striping but that does not really help much.
    >
    >      Any hint is really appreciated.
    >
    >      Best
    >
    >      Riccardo
    >
    >
    >
    >      _______________________________________________
    >      lustre-discuss mailing list
    >      lustre-discuss at lists.lustre.org
    >      http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    >