[lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

Dilger, Andreas andreas.dilger at intel.com
Mon Apr 9 16:15:11 PDT 2018


On Apr 6, 2018, at 23:04, Riccardo Veraldi <Riccardo.Veraldi at cnaf.infn.it> wrote:
> 
> So I'm struggling since months with these low performances on Lsutre/ZFS.
> 
> Looking for hints.
> 
> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
> 
> each OSS has one  OST raidz
> 
>   pool: drpffb-ost01
>  state: ONLINE
>   scan: none requested
>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
> config:
> 
>     NAME          STATE     READ WRITE CKSUM
>     drpffb-ost01  ONLINE       0     0     0
>       raidz1-0    ONLINE       0     0     0
>         nvme0n1   ONLINE       0     0     0
>         nvme1n1   ONLINE       0     0     0
>         nvme2n1   ONLINE       0     0     0
>         nvme3n1   ONLINE       0     0     0
>         nvme4n1   ONLINE       0     0     0
>         nvme5n1   ONLINE       0     0     0
> 
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
> 
> What could be the cause of OST performance going up and down like a
> roller coaster ?

Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench?  After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.

Cheers, Andreas


> for reference here are few considerations:
> 
> filesystem parameters:
> 
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
> 
> NVMe SSD are  4KB/sector
> 
> ashift=12
> 
> 
> ZFS module parameters
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> #
> options zfs zfs_vdev_scrub_min_active=48
> options zfs zfs_vdev_scrub_max_active=128
> #options zfs zfs_vdev_sync_write_min_active=64
> #options zfs zfs_vdev_sync_write_max_active=128
> #
> options zfs zfs_vdev_sync_write_min_active=8
> options zfs zfs_vdev_sync_write_max_active=32
> options zfs zfs_vdev_sync_read_min_active=8
> options zfs zfs_vdev_sync_read_max_active=32
> options zfs zfs_vdev_async_read_min_active=8
> options zfs zfs_vdev_async_read_max_active=32
> options zfs zfs_top_maxinflight=320
> options zfs zfs_txg_timeout=30
> options zfs zfs_dirty_data_max_percent=40
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_min_active=8
> options zfs zfs_vdev_async_write_max_active=32
> 
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









More information about the lustre-discuss mailing list