[lustre-discuss] OST mount with failover MDS

Thomas Roth t.roth at gsi.de
Wed Mar 17 04:56:43 PDT 2021


Hi all,

I wonder if I am seeing signs of network problems when mounting an OST:


tunefs.lustre --dryrun tells me (what I know from my own format command)
 >Parameters: mgsnode=10.20.3.0 at o2ib5:10.20.3.1 at o2ib5

These are the nids for our MGS+MDT0, there are two more pairs for MDT1 and MDT2.

I went step-by-step, modprobing lnet and lustre, and checking LNET by 'lnet ping' to the active MDTs, 
which worked fine.

However, mounting such an OST (e.g. after a crash) at first prints a number of
 > LNet: 19444:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.20.3.1 at o2ib5: 0 seconds

and similarly for the failover partners of the other two MDS.

Should it do that?


Imho, LNET to a failover node _must_ fail, because LNET should not be up on the failover node, right?

If I started LNET there, and some client does not get an answer quickly enough from the acting MDS, it 
would try the failover, LNET yes but Lustre no - that doesn't sound right.


Regards,
Thomas

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz



More information about the lustre-discuss mailing list