[lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD
Riccardo Veraldi
Riccardo.Veraldi at cnaf.infn.it
Fri Apr 6 22:04:53 PDT 2018
So I'm struggling since months with these low performances on Lsutre/ZFS.
Looking for hints.
3 OSSes, RHEL 74 Lustre 2.10.3 and zfs 0.7.6
each OSS has one OST raidz
pool: drpffb-ost01
state: ONLINE
scan: none requested
trim: completed on Fri Apr 6 21:53:04 2018 (after 0h3m)
config:
NAME STATE READ WRITE CKSUM
drpffb-ost01 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
nvme0n1 ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
nvme2n1 ONLINE 0 0 0
nvme3n1 ONLINE 0 0 0
nvme4n1 ONLINE 0 0 0
nvme5n1 ONLINE 0 0 0
while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
with Lustre on top of it performances are really poor.
most of all they are not stable at all and go up and down between
1.5GB/s and 6GB/s. I Tested with obfilter-survey
LNET is ok and working at 6GB/s (using infiniband FDR)
What could be the cause of OST performance going up and down like a
roller coaster ?
for reference here are few considerations:
filesystem parameters:
zfs set mountpoint=none drpffb-ost01
zfs set sync=disabled drpffb-ost01
zfs set atime=off drpffb-ost01
zfs set redundant_metadata=most drpffb-ost01
zfs set xattr=sa drpffb-ost01
zfs set recordsize=1M drpffb-ost01
NVMe SSD are 4KB/sector
ashift=12
ZFS module parameters
options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1
#
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_async_write_active_min_dirty_percent=20
#
options zfs zfs_vdev_scrub_min_active=48
options zfs zfs_vdev_scrub_max_active=128
#options zfs zfs_vdev_sync_write_min_active=64
#options zfs zfs_vdev_sync_write_max_active=128
#
options zfs zfs_vdev_sync_write_min_active=8
options zfs zfs_vdev_sync_write_max_active=32
options zfs zfs_vdev_sync_read_min_active=8
options zfs zfs_vdev_sync_read_max_active=32
options zfs zfs_vdev_async_read_min_active=8
options zfs zfs_vdev_async_read_max_active=32
options zfs zfs_top_maxinflight=320
options zfs zfs_txg_timeout=30
options zfs zfs_dirty_data_max_percent=40
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_async_write_min_active=8
options zfs zfs_vdev_async_write_max_active=32
More information about the lustre-discuss
mailing list