[lustre-discuss] Lustre 2.10.3 on ZFS - slow read performance

Fri Mar 30 10:00:38 PDT 2018

Hi Alex,

I’m no ZFS expert, but for a new project I recently faced some read performance issues too when doing some zfs 0.7 testing, but not at all as bad as you… so I feel sorry for you as you seem to have done quite a good work so far… so perhaps some ideas:

- check that arc is fully enabled (primarycache=all)
- restraint arc memory usage (zfs_arc_min = zfs_arc_max = 50% of the RAM is what we usually use)
- try different values of ashift (unlikely but try to keep the default 9, and/or try 11, 13 or more but it has a volume cost)
- turn off readahead but you already did that...
- bump zfs_vdev_cache_max to 131072
- bump zfs_read_chunk_size to 1310720
- bump zfs_vdev_cache_bshift to 17
- bump zfs_vdev_async_read_min_active to 8
- bump zfs_vdev_async_read_max_active to 32
- bump zfs_vdev_sync_read_max_active to 32

/!\ These tunings are not to take as is and are meant for performance testing, plus some of them might be obsolete with 0.7, but we still use a few of these on our ZFS on Linux systems.

As a rule of thumb, do not explicitly tune if you don’t really need it.

To get the best read performance from nearline drives with Lustre (without the use of any SSD), we went with mdraid/ldiskfs and we’re VERY happy with that.  With the same hardware, a ZFS backend will definitively provide better writes though.

Good luck...

Stephane

> On Mar 30, 2018, at 6:32 AM, Alex Vodeyko <alex.vodeyko at gmail.com> wrote:
> 
> Hi,
> 
> I'm still fighting with this setup:
> zpool with three or six 8+2 raidz2 vdevs shows very slow reads (0.5
> GB/s or even less compared with 2.5 GB/s writes)...
> I've tried recordsizes upto 16M and also zfs module parameters f.e.
> from http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-March/014307.html
> - unfortunately it didnot help.
> 
> Everything is quite good with six individual 8+2 raidz2 pools (0.6+
> GB/s read/write from each totaling 3.6+ GB/s), so I would probably
> have to go with six OSTs (individual pool with 8+2 raidz2 each).
> But I still hope there should be such a setups (zpool with three or
> six 8+2 raidz2 vdevs) in a production, so I kindly ask to share your
> setups or any ideas helping to diagnose this problem?
> 
> Thank you in advance,
> Alex
> 
> 
> 
> 2018-03-27 22:55 GMT+03:00 Alex Vodeyko <alex.vodeyko at gmail.com>:
>> Hi,
>> 
>> I'm setting up the new lustre test setup with the following hw config:
>> - 2x servers (dual E5-2650v3, 128GB RAM), one MGS/MDS, one OSS
>> - 1x HGST 4U60G2 JBOD with 60x 10TB HUH721010AL5204 drives (4k
>> physical, 512 logical sector size), connected to OSS using lsi 9300-8e
>> 
>> Lustre 2.10.3 servers/clients (centos 7.4), zfs - 0.7.5 and also 0.7.7
>> 
>> Initially I planned to use 2 zpools with three 8+2 vdevs or 1 zpool
>> with six 8+2 vdevs.
>> 
> ..
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org