[lustre-discuss] FLR Mirroring for read performance

Andreas Dilger adilger at whamcloud.com
Thu May 19 02:15:25 PDT 2022


On May 11, 2022, at 08:25, Nathan Dauchy wrote:
> 
> Greetings!

Hello Nathan,

> During the helpful LUG tutorial from Rick Mohr on advanced lustre file layouts, it was mentioned that “lfs mirror” could be used to improve read performance.  And the manual supports this, stating “files that are concurrently read by many clients (e.g. input decks, shared libraries, or executables) the aggregate parallel read performance of a single file can be improved by creating multiple mirrors of the file data”.
>  
> What method does Lustre use to ensure that multiple clients balance their read workloads from the multiple mirrors?

Currently (2.15.0), if there are no mirror copies marked "prefer", it tries the mirror with the most stripes on flash devices (vs. mirrors on HDDs), and if there are still multiple mirrors it uses the hash of a client memory pointer address modulo mirror count.  This should be relatively random for each client to distribute the read workload across mirrors. 

I'm not totally sure why the "hash of the pointer address" mechanism was implemented, as clients typically use the client NID as the basis for "autonomous" load distribution (modulo mirror count in this case) so that the workload is "ideally" distributed across copies without any added communication.  The latter is what is described in LU-10158 "FLR: Define a replica choosing policy function", but this is not fully implemented.

> Are there any tuning parameters that should be considered, other than making sure the “preferred” flag is NOT set on a single mirror, to help even out the read workload among the OSTs?
>  
> Has anyone tested this and quantified the performance improvement?

I don't recall seeing any benchmarks to verify this behavior for reads, but I'd be interested to learn of any results you find.

In typical FLR uses that I'm aware of this is mainly between HDD and NVMe mirror copies, not multiple copies on the same class of storage, so they use either the "prefer" flag set on the flash mirror, or with LU-14996 it also checks the OS_STATFS_NONROT flag from the OSTs (if this is reported, check "lfs df -v" for the 'f' (flash) flag).

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









More information about the lustre-discuss mailing list