[Lustre-discuss] MDS: lock timed out -- not entering recovery in server code, just going back to sleep

Brian J. Murrell Brian.Murrell at Sun.COM
Thu Nov 27 10:28:44 PST 2008


On Thu, 2008-11-27 at 19:18 +0100, Thomas Roth wrote:
> 
>  > Nov 27 17:57:41 lustre kernel: LustreError: 
> 3974:0:(ldlm_request.c:64:ldlm_expired_completion_wait()) ### lock timed 
> out (enqueued
>  > at 1227804060, 1001s ago); not entering recovery in server code, just 
> going back to sleep ns: mds-lustre-MDT0000_UUID lock:
>  > e4f54680/0x2ccbde901a8157f2 lrc: 3/1,0 mode: --/CR res: 
> 74908813/3524601089 bits 0x2 rrc: 173 type: IBT flags: 4004030 remote:
>  > 0x0 expref: -99 pid 3974
> 
> My question is now whether you would interpret this as a result of 
> ongoing trouble with the network

Yes, network problems are a common cause of this.

> There are more disturbing log messages, many of the following type:
> 
>  > Nov 27 18:17:42 lustre kernel: LustreError: 
> 28521:0:(mds_open.c:1474:mds_close()) @@@ no handle for file close ino 
> 81208923:
>  >  cookie 0x2ccbde8fcca85aa7  req at f3e75800 x1686877/t0
>  > o35->e0c12120-24ea-68c2-0394-712e75354f55 at NET_0x200008cb5726e_UUID:-1 
> lens 296/3472 ref 0 fl Interpret:/0/0 rc 0/0

There was a description of that here or in bugzilla not that long ago.
IIRC it's the result of a recovery operations where the MDS performs a
close on a file on behalf of a disconnected client and then the client
comes along and tries to explicitly close the file but the MDS has
already done it for it.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081127/4593ef5d/attachment.pgp>


More information about the lustre-discuss mailing list