[Lustre-discuss] MDS: lock timed out -- not entering recovery in server code, just going back to sleep

Thomas Roth t.roth at gsi.de
Thu Nov 27 10:18:34 PST 2008


Hi all,

after some nasty problems with network switches we see repeated hangs of 
our Lustre system. On a client, "lfs check mds" will say
 > error: check 'lustre-MDT0000-mdc-ffff810419da5800': Resource 
temporarily unavailable (11)
All our OSS are doing o.k. The MDS itself seems to be o.k, too - no 
error messages in the logs directly related to this situation, but also 
nothing that would indicate that the MDT had taken notice of a client 
trying to reconnect, not even when trying to mount the  FS on a new 
client. The MDT had just become unresponsive.
Since nothing goes in such a situation anyhow, we rebooted the MDS. 
After recovery, the clients reconnect, the FS seems to be fine again.
However, the MDT is dumping log like crazy - a few times per minute, and 
most dumps are empty.
In addition, in the logs I find a lot of

 > Nov 27 17:57:41 lustre kernel: LustreError: 
3974:0:(ldlm_request.c:64:ldlm_expired_completion_wait()) ### lock timed 
out (enqueued
 > at 1227804060, 1001s ago); not entering recovery in server code, just 
going back to sleep ns: mds-lustre-MDT0000_UUID lock:
 > e4f54680/0x2ccbde901a8157f2 lrc: 3/1,0 mode: --/CR res: 
74908813/3524601089 bits 0x2 rrc: 173 type: IBT flags: 4004030 remote:
 > 0x0 expref: -99 pid 3974

My question is now whether you would interpret this as a result of 
ongoing trouble with the network  - or is it a sign of MDT-illness?


There are more disturbing log messages, many of the following type:

 > Nov 27 18:17:42 lustre kernel: LustreError: 
28521:0:(mds_open.c:1474:mds_close()) @@@ no handle for file close ino 
81208923:
 >  cookie 0x2ccbde8fcca85aa7  req at f3e75800 x1686877/t0
 > o35->e0c12120-24ea-68c2-0394-712e75354f55 at NET_0x200008cb5726e_UUID:-1 
lens 296/3472 ref 0 fl Interpret:/0/0 rc 0/0


What to make of that?

Hm, the MDS is running Lustre v 1.6.3, the OSS 1.6.4.2, the clients 
1.6.5 - may not be the most healthy mix, either?

Thanks,
Thomas



More information about the lustre-discuss mailing list