[lustre-discuss] Lustre 2.9 performance issues

Tue Apr 25 12:11:42 PDT 2017

Hi Darby,

> -----Original Message-----
> 
> for i in $(seq 0 99) ; do
>    dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1
> done
> 
> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges from 20
> to 60 sec on our newer 2.9 LFS.  

Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements fsync() by
waiting for an entire transaction group to get written out. This can incur long
delays on a busy filesystem as the transaction groups become quite large. Work
on implementing ZIL support is being tracked in LU-4009 but this feature is not
expected to make it into the upcoming 2.10 release.

One way to observe this on a given server is with the txgs kstat.

  echo 20 > /sys/module/zfs/parameters/zfs_txg_history # number of txgs to show
  watch cat /proc/spl/kstat/zfs/POOLNAME/txgs

Large values in the time columns (units are nanoseconds) could account for the
delays you're seeing. Conversely I'd expect to see relatively small values on your 2.4.3
filesystem where fsync() is returning quickly.

As to why it's slower on your newer filesystem, my first guess would be that it's
more heavily utilized. But that's just a guess. I'm assuming it also uses a ZFS backend.
Are there any other relevant tunings or patches you've applied to that system?

Ned