[lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Fri Apr 6 22:04:53 PDT 2018

So I'm struggling since months with these low performances on Lsutre/ZFS.

Looking for hints.

3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6

each OSS has one  OST raidz

  pool: drpffb-ost01
 state: ONLINE
  scan: none requested
  trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)

    drpffb-ost01  ONLINE       0     0     0
      raidz1-0    ONLINE       0     0     0
        nvme0n1   ONLINE       0     0     0
        nvme1n1   ONLINE       0     0     0
        nvme2n1   ONLINE       0     0     0
        nvme3n1   ONLINE       0     0     0
        nvme4n1   ONLINE       0     0     0
        nvme5n1   ONLINE       0     0     0

while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
with Lustre on top of it performances are really poor.
most of all they are not stable at all and go up and down between
1.5GB/s and 6GB/s. I Tested with obfilter-survey
LNET is ok and working at 6GB/s (using infiniband FDR)

What could be the cause of OST performance going up and down like a
roller coaster ?

for reference here are few considerations:

filesystem parameters:

zfs set mountpoint=none drpffb-ost01
zfs set sync=disabled drpffb-ost01
zfs set atime=off drpffb-ost01
zfs set redundant_metadata=most drpffb-ost01
zfs set xattr=sa drpffb-ost01
zfs set recordsize=1M drpffb-ost01

NVMe SSD are  4KB/sector


ZFS module parameters

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_async_write_active_min_dirty_percent=20
options zfs zfs_vdev_scrub_min_active=48
options zfs zfs_vdev_scrub_max_active=128
#options zfs zfs_vdev_sync_write_min_active=64
#options zfs zfs_vdev_sync_write_max_active=128
options zfs zfs_vdev_sync_write_min_active=8
options zfs zfs_vdev_sync_write_max_active=32
options zfs zfs_vdev_sync_read_min_active=8
options zfs zfs_vdev_sync_read_max_active=32
options zfs zfs_vdev_async_read_min_active=8
options zfs zfs_vdev_async_read_max_active=32
options zfs zfs_top_maxinflight=320
options zfs zfs_txg_timeout=30
options zfs zfs_dirty_data_max_percent=40
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_async_write_min_active=8
options zfs zfs_vdev_async_write_max_active=32

More information about the lustre-discuss mailing list