[Lustre-discuss] lnet_try_match_md()) Matching packet from 12345-10.5.203.250 at tcp, match 19154486 length 728 too big
Michael D. Seymour
seymour at cita.utoronto.ca
Fri May 29 12:51:35 PDT 2009
Michael D. Seymour wrote:
> Hi all,
>
> I hope you could help us with some connection problems we are having with our
> lustre file system. The filesystem roc consists of 6 OSSs with one OST per OSS.
> Each OSS uses the 1.6.7 RHEL 5 kernel on Centos 5.1 (one unit uses Centos 5.3).
> The MDS uses CentOS 5.1 and Lustre 1.6.7. 203 RHEL-based clients mount the
> filesystem and all use Lustre 1.6.7. All are connected via a Gb ethernet switch
> stack.
>
> One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on a
> different network.
>
Also got this earlier today before more verbose debug logging was enabled:
On client trinity:
May 29 10:35:47 trinity kernel: LustreError:
5111:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from
12345-10.5.203.250 at tcp, match 20177453 length 728 too big: 704 left, 704 allowed
May 29 10:40:47 trinity kernel: LustreError: 11-0: an error occurred while
communicating with 10.5.203.250 at tcp. The mds_close operation failed with -116
May 29 10:40:47 trinity kernel: LustreError:
26783:0:(file.c:113:ll_close_inode_openhandle()) inode 37609433 mdc close
failed: rc = -116
May 29 10:40:47 trinity kernel: LustreError:
26783:0:(file.c:113:ll_close_inode_openhandle()) Skipped 1 previous similar message
On MDS rocpile:
May 29 10:35:47 rocpile kernel: LustreError:
10227:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433:
cookie 0xa00c7cf9e763396b req at ffff8101274e3400 x20177453/t0
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens
296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc 0/0
May 29 10:35:47 rocpile kernel: LustreError:
10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116)
req at ffff8101274e3400 x20177453/t0
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens
296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc -116/0
May 29 10:35:47 rocpile kernel: LustreError:
10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 1 previous similar message
May 29 10:40:47 rocpile kernel: LustreError:
3611:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433:
cookie 0xa00c7cf9e763396b req at ffff81011f0cda00 x20177453/t0
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens
296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc 0/0
May 29 10:40:47 rocpile kernel: LustreError:
3611:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116)
req at ffff81011f0cda00 x20177453/t0
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens
296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc -116/0
I've already extended /proc/sys/lustre/timeout to 300s.
Thanks again,
Mike
--
Michael D. Seymour Phone: 416-978-8497
Scientific Computing Support Fax: 416-978-3921
Canadian Institute for Theoretical Astrophysics, University of Toronto
More information about the lustre-discuss
mailing list