[lustre-discuss] Lustre 2.9 performance issues

Tue Apr 25 07:54:34 PDT 2017

Hello,

We are having a few performance issues with our newest lustre file system.  Here is the overview of our setup:

-) Supermicro servers connected to external 12Gb/s SAS JBODs for MDT/OSS storage

-) CentOS version = 7.3.1611 (kernel 3.10.0-514.2.2.el7.x86_64) on the servers and clients

-) stock kernels on servers (i.e. no lustre patches to the kernel)

-) ZFS backend

   o  version = zfs-0.6.5.8-1.el7_3.centos.x86_64

   o  ZFS metadata pool: RAID10, 12x 15,000 rpm Seagate Cheeta 12 Gb/s SAS

   o  ZFS ost pools (12x): RAIDZ3 15x 7,200 rpm HGST 8TB 12 Gb/s SAS

   o  ZFS compression on OST's (not MDT)

-) lustre version 2.9.0 (plus a patch to get dual homed servers with failover working properly)

-) lnet dual home config – ethernet and infiniband

-) ~400 IB clients, ~60 ethernet clients

Our large file read/write performance is excellent.  However, "everyday" operations (creating, moving, removing, and opening files) are noticeably slower as compared to our older LFS (version 2.4.3).  We are surprised at this since both the server hardware and software are newer and more capable.  One concrete thing we have discovered is that any operation that forces a flush to the disk is more than 10x slower on our new setup.  We discovered this when pasting a large amount of text into vim – this took 30+ seconds to return on a file that lives on our new LFS but almost no lag for a file that is on our old LFS, an NFS mount or a local disk. It turns out vim forces a sync the swap file to disk every 200 characters by default. We replicated this with a simple dd:

for i in $(seq 0 99) ; do

   dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1

done

The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges from 20 to 60 sec on our newer 2.9 LFS.  We’ve tried a number of ZFS tuning options (zfs_txg_timeout, zfs_vdev_scheduler, zfs_prefetch_disable,…) with little/no impact.

Additionally, we temporarily toggled sync’ing on the ZFS filesystems underlying our LFS:

# zfs set sync=disabled metadata/meta-fsl
# zfs set sync=disabled oss00-0/ost-fsl

# (repeat 11x for other oss/ost's)

later restoring via

# zfs set sync=standard metadata/meta-fsl

# zfs set sync=standard oss00-0/ost-fsl

# (repeat 11x for other oss/ost's)

We tried the same test on a raw zfs FS (oss00-0/testfs) in the same pool.  This option significantly speeds up fsync() calls to an individual ZFS filesystem (i.e. oss00-0/testfs shows ~10x speedup), but only marginally benefited the LFS fsync() behavior (~10%).

Any ideas on what might be going on here?  Any other ZFS or lustre tuning you would try?

Thanks,

Darby
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170425/7f50a91f/attachment.htm>