[lustre-discuss] Lustre 2.9 performance issues

Wed Apr 26 08:03:44 PDT 2017

Thanks for the kstat info.  Our 2.4 LFS is quite a bit different architecture – ldiskfs on a hardware RAID – so no opportunity to compare the zfs kstat info between the two.  Our 2.9 LFS is barely in production at this point and only a handful of people have moved over to it.  So its utilization is quite a bit lower than our 2.4 LFS.  Lustre defaults have generally worked well for us so we've done very little tuning.  Our 2.4 LFS uses whamcloud lustre rpms on the servers (i.e. no patches).  About the only tuning we've done (both on the 2.4 and 2.9 LFS's) is this in /etc/modprobe.d/lustre.conf

options ko2iblnd map_on_demand=32

Darby

-----Original Message-----
From: "Bass, Ned" <bass6 at llnl.gov>
Date: Tuesday, April 25, 2017 at 2:11 PM
To: Darby Vicker <darby.vicker-1 at nasa.gov>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: RE: [lustre-discuss] Lustre 2.9 performance issues

Hi Darby,

> -----Original Message-----
> 
> for i in $(seq 0 99) ; do
>    dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1
> done
> 
> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges from 20
> to 60 sec on our newer 2.9 LFS.  

Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements fsync() by
waiting for an entire transaction group to get written out. This can incur long
delays on a busy filesystem as the transaction groups become quite large. Work
on implementing ZIL support is being tracked in LU-4009 but this feature is not
expected to make it into the upcoming 2.10 release.

One way to observe this on a given server is with the txgs kstat.

  echo 20 > /sys/module/zfs/parameters/zfs_txg_history # number of txgs to show
  watch cat /proc/spl/kstat/zfs/POOLNAME/txgs

Large values in the time columns (units are nanoseconds) could account for the
delays you're seeing. Conversely I'd expect to see relatively small values on your 2.4.3
filesystem where fsync() is returning quickly.

As to why it's slower on your newer filesystem, my first guess would be that it's
more heavily utilized. But that's just a guess. I'm assuming it also uses a ZFS backend.
Are there any other relevant tunings or patches you've applied to that system?

Ned