[lustre-discuss] Lustre poor performance

Thu Aug 17 21:22:24 PDT 2017

It appears that you are running iozone on a single client?  What kind of network is tcp5?  Have you looked at the network to make sure it is not the bottleneck?

-- 
Dennis Nelson
Mobile: 817-233-6116

Applications Support Engineer
DataDirect Networks, Inc.
dnelson at ddn.com

On 8/17/17, 10:22 PM, "lustre-discuss on behalf of Riccardo Veraldi" <lustre-discuss-bounces at lists.lustre.org on behalf of Riccardo.Veraldi at cnaf.infn.it> wrote:

    Hello,

    I am running Lustre 2.10.0 on Centos 7.3
    I have one MDS and two OSSes, each with one OST
    each OST is a ZFS raidz1 with 6 nvme disks each.
    The configuration of ZFS is done in a way to allow maximum write
    performances:

    zfs set sync=disabled drpffb-ost02
    zfs set atime=off drpffb-ost02
    zfs set redundant_metadata=most drpffb-ost02
    zfs set xattr=sa drpffb-ost02
    zfs set recordsize=1M drpffb-ost02

    every NVMe disk has 4K byte sector, zfs -o  ashift=12

    In a LOCAL raidz1 configuration I get 3.6GB/sec writings and 5GB/sec
    readings.

    The same configuration thru Lustre has very poor performances, 1.3GB/sec
    writes and 2GB/sec reads

    There must be something else to look for having better performances but
    a local ZFS raidz1 is working pretty good.

    this is the Lustre partition client side:

    172.21.42.159 at tcp5:/drpffb                  10T  279G  9.8T   3% /drpffb

    UUID                       bytes        Used   Available Use% Mounted on
    drpffb-MDT0000_UUID        19.1G        2.1M       19.1G   0% /drpffb[MDT:0]
    drpffb-OST0001_UUID         5.0T      142.2G        4.9T   3% /drpffb[OST:1]
    drpffb-OST0002_UUID         5.0T      136.4G        4.9T   3% /drpffb[OST:2]

    filesystem_summary:        10.0T      278.6G        9.7T   3% /drpffb

    Tests both on Lustre/ZFS and local ZFS are based on 50 threads writing
    4GB of data each and 50 threads reading using iozone:

    iozone  -i 0 -t 50 -i 1 -t 50 -s4g

    I do not know what else I can do to improve performances

    here some details on the OSSes

    OSS01:

    NAME                 USED  AVAIL  REFER  MOUNTPOINT
    drpffb-ost01        39.4G  4.99T   153K  none
    drpffb-ost01/ost01  39.4G  4.99T  39.4G  none

      pool: drpffb-ost01
     state: ONLINE
      scan: none requested
    config:

        NAME         STATE     READ WRITE CKSUM
        drpffb-ost01  ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            nvme0n1  ONLINE       0     0     0
            nvme1n1  ONLINE       0     0     0
            nvme2n1  ONLINE       0     0     0
            nvme3n1  ONLINE       0     0     0
            nvme4n1  ONLINE       0     0     0
            nvme5n1  ONLINE       0     0     0

    OSS02:

    NAME                 USED  AVAIL  REFER  MOUNTPOINT
    drpffb-ost02        62.2G  4.97T   153K  none
    drpffb-ost02/ost02  62.2G  4.97T  62.2G  none

      pool: drpffb-ost02
     state: ONLINE
      scan: none requested
    config:

        NAME         STATE     READ WRITE CKSUM
        drpffb-ost02  ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            nvme0n1  ONLINE       0     0     0
            nvme1n1  ONLINE       0     0     0
            nvme2n1  ONLINE       0     0     0
            nvme3n1  ONLINE       0     0     0
            nvme4n1  ONLINE       0     0     0
            nvme5n1  ONLINE       0     0     0

    thanks to anyone who may help giving hints.

    Rick

    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org