[Lustre-discuss] no handle for file close

Robin Humble robin.humble+lustre at anu.edu.au
Sun May 10 01:15:46 PDT 2009


On Thu, May 07, 2009 at 10:45:31AM -0500, Nirmal Seenu wrote:
>I am getting quite a few errors similar to the following error on the 
>MDS server which is running the latest 1.6.7.1 patched kernel. The 
>clients are running 1.6.7 patchless client on 2.6.18-128.1.6.el5 kernel 
>and this cluster has 130 nodes/Lustre clients and uses GigE network.
>
>May  7 04:13:48 lustre3 kernel: LustreError: 7213:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 772769: cookie 0xcfe66441310829d4  req at ffff8101ca8a3800 x2681218/t0 o35->fedc91f9-4de7-c789-6bdd-1de1f5e3dd33 at NET_0x20000c0a8f109_UUID:0/0 lens 296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc 0/0
>
>May  7 04:13:48 lustre3 kernel: LustreError: 7213:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-116)  req at ffff8101ca8a3800 x2681218/t0 o35->fedc91f9-4de7-c789-6bdd-1de1f5e3dd33 at NET_0x20000c0a8f109_UUID:0/0 lens 296/1680 e 0 to 0 dl 1241687634 ref 1 fl Interpret:/0/0 rc -116/0
>
>I don't see the same errors on another cluster/Lustre installation with 
>2000 Lustre clients which uses Infiniband network.

we see this sometimes when a job that is using a shared library that
lives on Lustre is killed - presumably the un-memorymapping of the .so
from a bunch of nodes at once confuses Lustre a bit.

what is your inode 772769?
eg.
   find -inum 772769 /some/lustre/fs/
if the file is a .so then that would be similar to what we are seeing.

so we have this listed in the "probably harmless" section of the errors
that we get from Lustre, so if it's not harmless than we'd very much
like to know about it :)

this cluster is IB, rhel5, x86_64, 1.6.6 on servers and patchless
1.6.4.3 on clients w/ 2.6.23.17 kernels.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility

>I looked at the following bugs 19328, 18946, 18192 and 19085 but I am 
>not sure if any of those bugs apply to this error. I would appreciate it 
>someone could help me understand these errors and possibly suggest the 
>solution.
>
>TIA
>Nirmal
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list