[lustre-discuss] new client - failover mds: no connection
Thomas Roth
t.roth at gsi.de
Tue Oct 24 10:08:26 PDT 2017
Sorry to have bothered you - works now.
I have set /sys/fs/lustre/timeout=3000, quite brutally, to make things go verrry slowly, and after 25
minutes the mount was there.
Which control aka timeout-parameter _should_ I have tuned instead in such a situation?
Regards,
Thomas
On 10/24/2017 06:26 PM, Thomas Roth wrote:
> Hi all,
>
> in a Lustre 2.10, CentOS 7.4 test system, I have a pair of MDS, format command was
>
> > mkfs.lustre --mgs --mdt --fsname=test --index=0
> --servicenode=10.20.1.198 at o2ib5 --servicenode=10.20.1.199 at o2ib5
> --mgsnode=10.20.1.198 at o2ib5 --mgsnode=10.20.1.199 at o2ib5 /dev/drbd0
>
> I added some OSS and clients, everything working.
>
> Then I switched off 10.20.1.198 and mounted my MGS/MDT on 10.20.1.199.
> All OSS and clients connected, everything working.
>
> Now I try to add a client that was never there before,
> > mount -t lustre 10.20.1.198 at o2ib5:10.20.1.199 at o2ib5:/test /lustre/test
>
> But this client only tries to connect to 10.20.1.198 at o2ib5 - and fails.
> The log says
>
> LNet: 47655:0:(o2iblnd_cb.c:2672:kiblnd_check_reconnect()) 10.20.1.198 at o2ib5: reconnect (invalid
> service id), 12, 12, msg_size: 4096, queue_depth: 8/-1, max_frags: 256/-1
> LNet: 47655:0:(o2iblnd_cb.c:2698:kiblnd_rejected()) 10.20.1.198 at o2ib5 rejected: no listener at 987
> ...
> LustreError: 48560:0:(mgc_request.c:251:do_config_log_add()) MGC10.20.1.198 at o2ib5: failed processing
> log, type 1: rc = -5
> LNet: 48427:0:(o2iblnd_cb.c:3207:kiblnd_check_conns()) Timed out tx for 10.20.1.198 at o2ib5: 4301501
> seconds
> Lustre: 48441:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network
> error: [sent 1508861258/real 1508861264] req at ffff88103dc78000 x1582155623825424/t0(0)
> o250->MGC10.20.1.198 at o2ib5@10.20.1.198 at o2ib5:26/25 lens 520/544 e 0 to 1 dl 1508861408 ref 1 fl
> Rpc:eXN/0/ffffffff rc 0/-1
>
>
> all of which seems logical but not wanted - where is my 10.20.1.199 at o2ib5 ?
>
> Of course I can 'lctl ping 10.20.1.199 at o2ib5'.
> And I have since umounted on one of the older clients, unloaded the Lustre modules, and mounted again
> - works.
>
>
> Regards,
> Thomas
>
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.250
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung: Ursula Weyrich
Professor Dr. Paolo Giubellino
Jörg Blaurock
Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
More information about the lustre-discuss
mailing list