[lustre-discuss] very slow mounts with OSS node down and peer discovery enabled

Andreas Dilger adilger at whamcloud.com
Thu Oct 26 12:49:33 PDT 2023


I can't comment on the LNet peer discovery part, but I would definitely not tecommend to leave the lnet_transaction_timeout that low for normal usage. This can cause messages to be dropped while the server is processing them and introduce failures needlessly. 

Cheers, Andreas

> On Oct 26, 2023, at 09:48, Bertschinger, Thomas Andrew Hjorth via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
> 
> Hello,
> 
> Recently we had an OSS node down for an extended period with hardware problems. While the node was down, mounting lustre on a client took an extremely long time to complete (20-30 minutes). Once the fs is mounted, all operations are normal and there isn't any noticeable impact from the absent node.
> 
> While the client is mounting, the client's debug log shows entries like this slowly going by:
> 
> 00000020:00000080:87.0:1698333195.993098:0:3801046:0:(obd_config.c:1384:class_process_config()) processing cmd: cf005
> 00000020:00000080:87.0:1698333195.993099:0:3801046:0:(obd_config.c:1396:class_process_config()) adding mapping from uuid 10.1.2.3 at o2ib to nid 0x500000abcd123 (10.1.2.4 at o2ib)
> 
> and there is a "llog_process_th" kernel thread hanging in lnet_discover_peer_locked().
> 
> We have peer discovery enabled on our clients, but disabling peer discovery on a client causes the mount to complete quickly. Also, once the down OSS was fixed and powered back on, mounting completed normally again.
> 
> We also found that reducing the following timeout sped up the mount by a factor of ~10:
> 
> $ lnetctl set transaction_timeout 5    # was 50 originally
> 
> Is such a dramatic slowdown normal in this situation? Is there any fix (aside from disabling peer discovery or tuning down the timeout) that could speed up mounts in case we have another OSS down in the future?
> 
> Lustre version (server and client): 2.15.3
> 
> Thanks, 
> Thomas Bertschinger
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list