[lustre-discuss] new client - failover mds: no connection

Thomas Roth t.roth at gsi.de
Tue Oct 24 10:08:26 PDT 2017


Sorry to have bothered you - works now.

I have set /sys/fs/lustre/timeout=3000, quite brutally, to make things go verrry slowly, and after 25 
minutes the mount was there.

Which control aka timeout-parameter _should_ I have tuned instead in such a situation?

Regards,
Thomas

On 10/24/2017 06:26 PM, Thomas Roth wrote:
> Hi all,
> 
> in a Lustre 2.10, CentOS 7.4 test system, I have a pair of MDS, format command was
> 
>  > mkfs.lustre --mgs --mdt --fsname=test --index=0
> --servicenode=10.20.1.198 at o2ib5 --servicenode=10.20.1.199 at o2ib5
>      --mgsnode=10.20.1.198 at o2ib5     --mgsnode=10.20.1.199 at o2ib5  /dev/drbd0
> 
> I added some OSS and clients, everything working.
> 
> Then I switched off 10.20.1.198 and mounted my MGS/MDT on 10.20.1.199.
> All OSS and clients connected, everything working.
> 
> Now I try to add a client that was never there before,
>  > mount -t lustre 10.20.1.198 at o2ib5:10.20.1.199 at o2ib5:/test  /lustre/test
> 
> But this client only tries to connect to 10.20.1.198 at o2ib5 - and fails.
> The log says
> 
> LNet: 47655:0:(o2iblnd_cb.c:2672:kiblnd_check_reconnect()) 10.20.1.198 at o2ib5: reconnect (invalid 
> service id), 12, 12, msg_size: 4096, queue_depth: 8/-1, max_frags: 256/-1
> LNet: 47655:0:(o2iblnd_cb.c:2698:kiblnd_rejected()) 10.20.1.198 at o2ib5 rejected: no listener at 987
> ...
> LustreError: 48560:0:(mgc_request.c:251:do_config_log_add()) MGC10.20.1.198 at o2ib5: failed processing 
> log, type 1: rc = -5
> LNet: 48427:0:(o2iblnd_cb.c:3207:kiblnd_check_conns()) Timed out tx for 10.20.1.198 at o2ib5: 4301501 
> seconds
> Lustre: 48441:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network 
> error: [sent 1508861258/real 1508861264]  req at ffff88103dc78000 x1582155623825424/t0(0) 
> o250->MGC10.20.1.198 at o2ib5@10.20.1.198 at o2ib5:26/25 lens 520/544 e 0 to 1 dl 1508861408 ref 1 fl 
> Rpc:eXN/0/ffffffff rc 0/-1
> 
> 
> all of which seems logical but not wanted - where is my 10.20.1.199 at o2ib5 ?
> 
> Of course I can 'lctl ping 10.20.1.199 at o2ib5'.
> And I have since umounted on one of the older clients, unloaded the Lustre modules, and mounted again 
> - works.
> 
> 
> Regards,
> Thomas
> 

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.250
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Ursula Weyrich
Professor Dr. Paolo Giubellino
Jörg Blaurock

Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt



More information about the lustre-discuss mailing list