[Lustre-discuss] no failover with failover MDS
t.roth at gsi.de
Sat Sep 18 11:51:30 PDT 2010
we have two servers A, B as a failover MGS/MDT pair, with IPs
A=10.12.112.28 and B=10.12.115.120 over tcp.
When server B crashes, MGS and MDT are mounted on A. Recovery times out
with only one out of 445 clients recovered.
Afterwards, the MDT lists all its OSTs as UP and in the logs of the OSTs
Lustre: MGC10.12.112.28 at tcp: Connection restored to service MGS using
nid 10.12.112.28 at tcp.
Lustre: lustre-OST008d: received MDS connection from 10.12.112.28 at tcp
So far so good.
However, no client will reconnect, nor will a client connect to server A
when freshly mounted!
I do "mount -t lustre 10.12.112.28:10.12.115.120:/lustre /mp"
Lustre: Lustre Version: 1.8.4
Lustre: Build Version: 1.8.4-19700101010000-PRISTINE-2.6.26-2-amd64
Lustre: Added LNI 10.12.68.195 at tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
Lustre: MGC10.12.112.28 at tcp: Reactivating import
Lustre: 14530:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1347247522447397 sent from gsilust-MDT0000-mdc-ffff81033d489400 to NID
10.12.115.120 at tcp 5s ago has timed out (5s prior to deadline).
req at ffff8103312da400 x1347247522447397/t0
o38->gsilust-MDT0000_UUID at 10.12.115.120@tcp:12/10 lens 368/584 e 0 to 1
dl 1284835365 ref 1 fl Rpc:N/0/0 rc 0/0
Obviously the clients stubbornly try to connect to the failed server,
I'm sure the failover has worked before, since server A had its problems
last January, when the MDT was moved to B which has served the fs ever
No apparent changes were introduced in the mean time, so now I am at a loss.
More information about the lustre-discuss