[lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD
Dilger, Andreas
andreas.dilger at intel.com
Mon Apr 9 16:15:11 PDT 2018
On Apr 6, 2018, at 23:04, Riccardo Veraldi <Riccardo.Veraldi at cnaf.infn.it> wrote:
>
> So I'm struggling since months with these low performances on Lsutre/ZFS.
>
> Looking for hints.
>
> 3 OSSes, RHEL 74 Lustre 2.10.3 and zfs 0.7.6
>
> each OSS has one OST raidz
>
> pool: drpffb-ost01
> state: ONLINE
> scan: none requested
> trim: completed on Fri Apr 6 21:53:04 2018 (after 0h3m)
> config:
>
> NAME STATE READ WRITE CKSUM
> drpffb-ost01 ONLINE 0 0 0
> raidz1-0 ONLINE 0 0 0
> nvme0n1 ONLINE 0 0 0
> nvme1n1 ONLINE 0 0 0
> nvme2n1 ONLINE 0 0 0
> nvme3n1 ONLINE 0 0 0
> nvme4n1 ONLINE 0 0 0
> nvme5n1 ONLINE 0 0 0
>
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
>
> What could be the cause of OST performance going up and down like a
> roller coaster ?
Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench? After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.
Cheers, Andreas
> for reference here are few considerations:
>
> filesystem parameters:
>
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
>
> NVMe SSD are 4KB/sector
>
> ashift=12
>
>
> ZFS module parameters
>
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> #
> options zfs zfs_vdev_scrub_min_active=48
> options zfs zfs_vdev_scrub_max_active=128
> #options zfs zfs_vdev_sync_write_min_active=64
> #options zfs zfs_vdev_sync_write_max_active=128
> #
> options zfs zfs_vdev_sync_write_min_active=8
> options zfs zfs_vdev_sync_write_max_active=32
> options zfs zfs_vdev_sync_read_min_active=8
> options zfs zfs_vdev_sync_read_max_active=32
> options zfs zfs_vdev_async_read_min_active=8
> options zfs zfs_vdev_async_read_max_active=32
> options zfs zfs_top_maxinflight=320
> options zfs zfs_txg_timeout=30
> options zfs zfs_dirty_data_max_percent=40
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_min_active=8
> options zfs zfs_vdev_async_write_max_active=32
>
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation
More information about the lustre-discuss
mailing list