[lustre-discuss] RDMA too fragmented, OSTs unavailable (permanently)
paf at cray.com
Sat Sep 10 09:36:14 PDT 2016
It is somewhat sideways from your questions, but when Cray has seen this problem historically, it has almost always been due to lots of small direct I/O from a user code.
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Thomas Roth <t.roth at gsi.de>
Sent: Saturday, September 10, 2016 2:38:37 AM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] RDMA too fragmented, OSTs unavailable (permanently)
we are running Lustre 2.5.3 on Infiniband. We have massive problems with clients being unable to communicate with any number of OSTs, rendering the
entire cluster quite unusable.
> LNetError: 1399:0:(o2iblnd_cb.c:1140:kiblnd_init_rdma()) RDMA too fragmented for 10.20.0.242 at o2ib1 (256): 231/256 src 231/256 dst frags
> LNetError: 1399:0:(o2iblnd_cb.c:1690:kiblnd_reply()) Can't setup rdma for GET from 10.20.0.242 at o2ib1: -90
which eventually results in OSTs at that nid becoming "temporarily unavailable".
However, the OSTs are never recovered, until they are manually evicted or the host rebooted.
On the OSS side, this reads
> LNetError: 13660:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.20.0.220 at o2ib1 (56): c: 7, oc: 0, rc: 7
We have checked the IB fabric, which shows no errors. Since we are not able to reproduce this effect in a simple way, we have also scrutinized the
user code, so far without results.
Whenever this happens, the connection between client and OSS is fine under all IB test commands.
Communication between client and OSS is still going on, but obviously when Lustre tries to replay the missed transaction, this fragmentation limit is
hit again, so the OST never becomes available again.
If we understand correctly, the map_on_demand parameter should be increased as a workaround.
The ko2iblnd module seems to provide this parameter,
> modinfo ko2iblnd
> parm: map_on_demand:map on demand (int)
but no matter what we load the module with, map_on_demand always remains at the default value,
> cat /sys/module/ko2iblnd/parameters/map_on_demand
Is there any way to understand
- why this memory fragmentation occurs/becomes so large?
- how to measure the real fragmentation degree (o2iblnd simply stops at 256, perhaps we are at 1000?)
- why map_on_demand cannot be changed?
Of course this all looks very much like LU-5718, but our clients are not behind LNET routers.
There is one router which connects to the campus network but is not in use. And there are some routers which connect to an older cluster, but of
course the old (1.8) clients never show any of these errors.
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Professor Dr. Karlheinz Langanke
Vorsitzender des Aufsichtsrates: St Dr. Georg Schütte
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-discuss