[lustre-discuss] 'queue depth too large', but connection works
t.roth at gsi.de
Mon Jan 31 03:30:56 PST 2022
Digging a bit more into the ko2iblnd parameters, it seems the default for 'map_on_demand' comes out as '1' - both on mlx4 and mlx5 boxes.
I was reading about earlier issues with in rdma, which supposedly pushed the default to 256 - but that was perhaps to long ago.
Is it necessary to tune this parameter nowadays?
On 1/30/22 20:41, Horn, Chris wrote:
> Yes, this means the server has peer_credits=8, so can only accept that value. It informs the client of this so subsequent client connection attempt uses the lower value.
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Thomas Roth <t.roth at gsi.de>
> Sent: Saturday, January 29, 2022 11:46 AM
> To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] 'queue depth too large', but connection works
> Dear all,
> test system: servers 2.12.7, and a client 2.12.6., all mlx4.
> The client has some non-default ko2iblnd parameters, including "peer_credits=16".
> I mounted my test system there and happily copied around some directories. Only afterwards I found
> > LNetError: 5278:0:(o2iblnd_cb.c:2551:kiblnd_passive_connect()) Can't accept conn from 10.20.3.64 at o2ib6, queue depth too large: 16 (<=8 wanted)
> in the MDS log.
> I did read LU-3322, but obviously did not the point. "Can't accept conn" used to deny client access, but the MDS that didn't like my client just
> created some ~25k new objects on behalf of that client.
> Does this mean client and server negotiate a suitable value, but behind the scenes?
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss