[Lustre-discuss] Nodes claim error with files, then say everything is fine.

Chris Worley worleys at gmail.com
Wed Aug 6 08:29:47 PDT 2008


On Wed, Aug 6, 2008 at 9:15 AM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:
>
> So, now what does the MDS serving lfs-MDT0000 say about this?  Why did
> it evict?  What version of Lustre is this?  Perhaps you said so already
> and I have just forgotten.

1.6.5.1 clients w/ 1.6.4.3 OSS's.

The MDS is very verbose.  I get these all the time, even prior to the error:

Lustre: lfs-OST0000: haven't heard from client
12f00621-096c-b331-8774-abfc72dfd82
2 (at 36.102.36.15 at o2ib) in 92 seconds. I think it's dead, and I am evicting it.

Would these messages be correlated?

LustreError: 6820:0:(handler.c:1499:mds_handle()) operation 101 on
unconnected MDS from 12345-36.102.36.11 at o2ib
LustreError: 6820:0:(handler.c:1499:mds_handle()) Skipped 6 previous
similar messages
LustreError: 6820:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 000001022ed4fa00 x2285850/t0
o101-><?>@<?>:-1 lens 432/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 6820:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped
27 previous similar messages
Lustre: lfs-MDT0000: haven't heard from client
f8aab44b-4829-c626-8056-8b80d38c960e (at 36.102.36.11 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
Lustre: Skipped 27 previous similar messages
Lustre: 5696:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS:
e04def6d-186d-2d01-1beb-2e00418cfaca reconnecting
Lustre: 5696:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 12
previous similar messages
LustreError: 6877:0:(handler.c:1499:mds_handle()) operation 101 on
unconnected MDS from 12345-36.102.36.11 at o2ib
LustreError: 6877:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 0000010247c30050 x2286894/t0
o101-><?>@<?>:-1 lens 432/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 6877:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped
1 previous similar message
Lustre: lfs-MDT0000: haven't heard from client
f8aab44b-4829-c626-8056-8b80d38c960e (at 36.102.36.11 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
LustreError: 6829:0:(handler.c:1499:mds_handle()) operation 101 on
unconnected MDS from 12345-36.102.36.11 at o2ib
LustreError: 6829:0:(handler.c:1499:mds_handle()) Skipped 1 previous
similar message
LustreError: 6829:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 000001022aa7bc00 x2288557/t0
o101-><?>@<?>:-1 lens 432/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 5898:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 000001022f07da00 x2290923/t0
o101-><?>@<?>:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc -107/0
Lustre: lfs-OST0000: haven't heard from client
f8aab44b-4829-c626-8056-8b80d38c960e (at 36.102.36.11 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
Lustre: 6713:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS:
e04def6d-186d-2d01-1beb-2e00418cfaca reconnecting
Lustre: 6713:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 12
previous similar messages
Lustre: lfs-MDT0000: haven't heard from client
f8aab44b-4829-c626-8056-8b80d38c960e (at 36.102.36.11 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
Lustre: Skipped 5 previous similar messages
LustreError: 6464:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 0000010224c18200 x2239379/t0
o101-><?>@<?>:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 6464:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped
5 previous similar messages
LustreError: 7324:0:(handler.c:1499:mds_handle()) operation 37 on
unconnected MDS from 12345-36.102.36.11 at o2ib
LustreError: 6846:0:(handler.c:1499:mds_handle()) operation 101 on
unconnected MDS from 12345-36.102.36.6 at o2ib
Lustre: lfs-OST0000: haven't heard from client
f99ffcca-2168-b2ab-8648-e40f4725acd1 (at 36.102.36.9 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
Lustre: lfs-OST0000: haven't heard from client
df74b077-e833-5fce-2744-9c0fa51427c9 (at 36.102.36.6 at o2ib) in 92
seconds. I think it's dead, and I am evicting it.
Lustre: Skipped 23 previous similar messages
Lustre: 5697:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS:
e04def6d-186d-2d01-1beb-2e00418cfaca reconnecting
Lustre: 5697:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 12
previous similar messages
LustreError: 5860:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
processing error (-107)  req at 0000010251a06400 x2284477/t0
o101-><?>@<?>:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 5860:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped
54 previous similar messages


Chris



More information about the lustre-discuss mailing list