[Lustre-discuss] lnet_try_match_md()) Matching packet from 12345-10.5.203.250 at tcp, match 19154486 length 728 too big

Tim Burgess ozburgess+lustre at gmail.com
Mon Jun 22 04:19:41 PDT 2009


Hi all,

We are seeing this also, with clients and servers running
2.6.18-92.1.26.el5_lustre.1.6.7.2smp, tcp over gig-e only, after an
upgrade from 1.6.5.1 over the weekend.
(it appears that older client versions are working fine, but I've had
a couple of the new ones without trouble too so I don't really have
enough stats to be sure that it's a version thing)

If there's any chance it's related, we hit this bug on the MDS (also
after an fsck) just before the upgrade:
https://bugzilla.lustre.org/show_bug.cgi?id=19091

It was preventing the MDS/MGS from starting after the fsck (but before
the upgrade), but since bugzilla mentioned there was a related fix in
1.6.7.1 we proceeded with the upgrade and the MDS started fine after
that...  There are still some odd messages in the MDS log though - see
the bottom log segment below.

Any ideas out there?

Thanks,
Tim

-----
On the client:
(hand transcribed, please forgive any typos)

LustreError: 647:0:(lib-move.c:110:lnet_try_match_md()) Matching
packet from 12345-172.16.0.251 at tcp, match 115 length 1168 too big: 992
left, 992 allowed
Lustre: Request x115 sent from p1-MDT0000-mdc-ffff81012a031000 to NID
172.16.0.251 at tcp 100s has timed out (limit 100s)
Lustre: p1-MDT0000-mdc-ffff81012a031000: Connection to service
prod_mds_001 via nid 172.16.0.251 at tcp was lost; in progress operations
using this service will wait for recovery to complete.
Lustre: p1-MDT0000-mdc-ffff81012a031000: connection restored to
service prod_mds_001 using nid 172.16.0.251 at tcp

and then repeat...

On the servers:

Jun 22 19:01:29 mds001 kernel: LustreError:
3389:0:(service.c:611:ptlrpc_check_req()) @@@ DROPPING req from old
connection 309 < 310  req at ffff81010965dc00 x77181/t0
o400->12dffd61-75ec-a926-c333-3c3d8acf9201 at NET_0x20000ac100453_UUID:0/0
lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0
Jun 22 19:01:29 mds001 kernel: LustreError:
3389:0:(service.c:611:ptlrpc_check_req()) Skipped 3 previous similar
messages
Jun 22 19:02:06 mds001 kernel: Lustre:
3359:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
23127a45-3e3a-5b92-dba5-c7444d593e7f reconnecting
Jun 22 19:02:06 mds001 kernel: Lustre:
3359:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 77 previous
similar messages
Jun 22 19:02:25 oss019 kernel: Lustre:
3417:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0012:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss020 kernel: Lustre:
3370:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0013:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss014 kernel: Lustre:
3263:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST000d:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss025 kernel: Lustre:
3904:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0018:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss024 kernel: Lustre:
3901:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0017:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss029 kernel: Lustre:
3879:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST001c:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss028 kernel: Lustre:
3909:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST001b:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss010 kernel: Lustre:
3462:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0009:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss021 kernel: Lustre:
3933:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0014:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss022 kernel: Lustre:
3904:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0015:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss023 kernel: Lustre:
3928:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0016:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss030 kernel: Lustre:
3854:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST001d:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss027 kernel: Lustre:
3907:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST001a:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss026 kernel: Lustre:
3914:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0019:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss018 kernel: Lustre:
3379:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0011:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss016 kernel: Lustre:
3268:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST000f:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss017 kernel: Lustre:
3402:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0010:
8f3b0b35-1636-5355-671e-96c33c4017fd reconnecting
Jun 22 19:02:25 oss010 kernel: Lustre:
3462:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss016 kernel: Lustre:
3268:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 2 previous
similar messages
Jun 22 19:02:25 oss018 kernel: Lustre:
3379:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss017 kernel: Lustre:
3402:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-OST0010:
dcb418b0-12c5-61d2-ab8c-f9f3ced8130a reconnecting
Jun 22 19:02:25 oss030 kernel: Lustre:
3854:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss025 kernel: Lustre:
3904:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 2 previous
similar messages
Jun 22 19:02:25 oss023 kernel: Lustre:
3928:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 2 previous
similar messages
Jun 22 19:02:25 oss022 kernel: Lustre:
3904:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss021 kernel: Lustre:
3933:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss026 kernel: Lustre:
3914:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:02:25 oss027 kernel: Lustre:
3907:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 22 19:03:06 oss022 kernel: Lustre: p1-OST0015: haven't heard from
client 6aaa9429-5a2c-9c20-1fe8-e42c3d108882 (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:06 oss011 kernel: Lustre: p1-OST000a: haven't heard from
client 6aaa9429-5a2c-9c20-1fe8-e42c3d108882 (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:06 oss011 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:06 oss022 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss003 kernel: Lustre: p1-OST0002: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 mds001 kernel: Lustre: MGS: haven't heard from client
d14860df-7906-9a56-5c84-79b25b9cc99e (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss007 kernel: Lustre: p1-OST0006: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 mds001 kernel: Lustre: Skipped 2 previous similar messages
Jun 22 19:03:07 oss006 kernel: Lustre: p1-OST0005: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss005 kernel: Lustre: p1-OST0004: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss021 kernel: Lustre: p1-OST0014: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss026 kernel: Lustre: p1-OST0019: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss024 kernel: Lustre: p1-OST0017: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss013 kernel: Lustre: p1-OST000c: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss029 kernel: Lustre: p1-OST001c: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss027 kernel: Lustre: p1-OST001a: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss030 kernel: Lustre: p1-OST001d: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss009 kernel: Lustre: p1-OST0008: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss012 kernel: Lustre: p1-OST000b: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss019 kernel: Lustre: p1-OST0012: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss020 kernel: Lustre: p1-OST0013: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss018 kernel: Lustre: p1-OST0011: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss028 kernel: Lustre: p1-OST001b: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss002 kernel: Lustre: p1-OST0001: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss025 kernel: Lustre: p1-OST0018: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss017 kernel: Lustre: p1-OST0010: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss001 kernel: Lustre: p1-OST0000: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss014 kernel: Lustre: p1-OST000d: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss004 kernel: Lustre: p1-OST0003: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss023 kernel: Lustre: p1-OST0016: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss016 kernel: Lustre: p1-OST000f: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss010 kernel: Lustre: p1-OST0009: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss008 kernel: Lustre: p1-OST0007: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss015 kernel: Lustre: p1-OST000e: haven't heard from
client b7f3778d-1615-4e89-2829-5021086f51cf (at 172.16.5.3 at tcp) in 227
seconds. I think it's dead, and I am evicting it.
Jun 22 19:03:07 oss018 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss020 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss012 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss019 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss014 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss017 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss007 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss003 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss001 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss009 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss016 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss004 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss015 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss010 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss030 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss029 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss027 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss028 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss021 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss025 kernel: Lustre: Skipped 1 previous similar message
Jun 22 19:03:07 oss023 kernel: Lustre: Skipped 1 previous similar message

Possibly still related to the earlier problem, we have this sort of
thing appearing in the server logs too:


Jun 21 11:47:56 mds001 kernel: LustreError:
4040:0:(llog_obd.c:226:llog_add()) No ctxt
Jun 21 11:47:56 mds001 kernel: LustreError:
4040:0:(llog_obd.c:226:llog_add()) Skipped 351 previous similar
messages
Jun 21 11:47:56 mds001 kernel: LustreError:
4040:0:(lov_log.c:118:lov_llog_origin_add()) Can't add llog (rc = -19)
for stripe 0
Jun 21 11:47:56 mds001 kernel: LustreError:
4040:0:(lov_log.c:118:lov_llog_origin_add()) Skipped 351 previous
similar messages
Jun 21 11:48:04 mds001 kernel: Lustre:
4130:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:48:51 mds001 kernel: LustreError:
3624:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0x45e0e2f sub-object on OST idx 15/1: rc = -110
Jun 21 11:49:44 mds001 kernel: Lustre:
4132:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:50:21 mds001 kernel: LustreError:
4151:0:(llog_obd.c:226:llog_add()) No ctxt
Jun 21 11:50:21 mds001 kernel: LustreError:
4151:0:(lov_log.c:118:lov_llog_origin_add()) Can't add llog (rc = -19)
for stripe 0
Jun 21 11:50:54 mds001 kernel: LustreError:
3631:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0x51c0136 sub-object on OST idx 15/1: rc = -110
Jun 21 11:51:24 mds001 kernel: Lustre:
3644:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:51:50 mds001 kernel: LustreError:
4075:0:(llog_obd.c:226:llog_add()) No ctxt
Jun 21 11:51:50 mds001 kernel: LustreError:
4075:0:(lov_log.c:118:lov_llog_origin_add()) Can't add llog (rc = -19)
for stripe 0
Jun 21 11:53:05 mds001 kernel: Lustre:
4077:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:54:10 mds001 kernel: LustreError:
4128:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0x45f118f sub-object on OST idx 15/1: rc = -110
Jun 21 11:54:10 mds001 kernel: LustreError:
4128:0:(lov_request.c:692:lov_update_create_set()) Skipped 1 previous
similar message
Jun 21 11:54:45 mds001 kernel: Lustre:
4039:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:56:25 mds001 kernel: Lustre:
4147:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:58:05 mds001 kernel: Lustre:
4097:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 11:59:46 mds001 kernel: Lustre:
4075:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 12:03:06 mds001 kernel: Lustre:
4158:0:(ldlm_lib.c:541:target_handle_reconnect()) p1-MDT0000:
b88f4d25-7ba1-eaf0-6ddb-e0b12b04a934 reconnecting
Jun 21 12:03:06 mds001 kernel: Lustre:
4158:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 previous
similar message
Jun 21 12:05:40 mds001 kernel: LustreError:
4057:0:(lov_request.c:692:lov_update_create_set()) error creating fid
0x45e0ff4 sub-object on OST idx 15/1: rc = -110
Jun 21 12:05:40 mds001 kernel: LustreError:
4057:0:(lov_request.c:692:lov_update_create_set()) Skipped 1 previous
similar message
Jun 21 12:07:20 mds001 kernel: LustreError:
4071:0:(llog_obd.c:226:llog_add()) No ctxt
Jun 21 12:07:20 mds001 kernel: LustreError:
4071:0:(llog_obd.c:226:llog_add()) Skipped 8 previous similar messages

Cheers,
Tim




On Tue, Jun 9, 2009 at 3:55 AM, Michael D.
Seymour<seymour at cita.utoronto.ca> wrote:
> Alexey Lyashkov wrote:
>> Hi Michael,
>>
>>>> On Fri, 2009-05-22 at 16:38 -0400, Michael D. Seymour wrote:
>>>>> Hi all,
>>>>>
>>>>> One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on a
>>>>> different network.
>>>>>
>>>>> We get the following messages on a particular client:
>>>>>
>>>>> May 22 15:07:45 trinity kernel: LustreError:
>>>>> 5111:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from
>>>>> 12345-10.5.203.250 at tcp, match 19154486 length 728 too big: 704 left, 704 allowed
>>>> what frequently for this bug?
>>> Sets of entries (about 20) happen a few times per day, each entry spaced about
>>> ten minutes apart.
>> can you please show syslog messages around this time - should be exist
>> lines with errors related to 'match XXXXX' (in this example match
>> 19154486 -- should be something about request x19154486).
>
> I've upgraded the MDS to 1.6.7.1. So far no issues. I will probably upgrade to
> 1.8 very soon. Will write back if there is still problems.
>
> Mike
>
>
> --
> Michael D. Seymour                 Phone: 416-978-8497
> Scientific Computing Support       Fax: 416-978-3921
> Canadian Institute for Theoretical Astrophysics, University of Toronto
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list