[Lustre-discuss] lock completion timeouts?

daledude dale.dewd at gmail.com
Thu Nov 13 01:12:35 PST 2008


Im trying to determine the issue causing the below log entries on our
backup lustre fs. It seems to happen once or twice an hour during
rsync of another 20tb lustre fs. I dont see any errors on the 20tb
lustre fs. I've read that its not a good idea to run the MDT/MGS/OST/
ALL on the same server so maybe this is the reason for the errors, but
I'd like to understand better.

The set up is:
* 20tb lustre fs is using centos 5.2 64bit + lustre 1.6.5. Completely
seperate nodes and functions without errors.

* 8tb backup lustre fs using centos 5.2 64bit + lustre 1.6.6. MDT/MGS/
OST/ALL all on a single server with 4gb memory. I mount the 20tb
lustre fs on this machine and also run the rsync on it.


Nov 13 00:42:42 mds kernel: Lustre: Request x148006922 sent from
mybackup-OST0000 to NID 0 at lo 7s ago has timed out (limit 6s).
Nov 13 00:42:42 mds kernel: LustreError: 138-a: mybackup-OST0000: A
client on nid 0 at lo was evicted due to a lock completion callback to
0 at lo timed out: rc -107
Nov 13 00:42:42 mds kernel: LustreError: 2263:0:(ldlm_lib.c:
1619:target_send_reply_msg()) @@@ processing error (-107)
req at eaffe400 x148007203/t0 o4-><?>@<?>:0/0 lens 384/0 e 0 to 0 dl
1226565862 ref 1 fl Interpret:/0/0 rc -107/0
Nov 13 00:42:42 mds kernel: LustreError: 11-0: an error occurred while
communicating with 0 at lo. The ost_write operation failed with -107
Nov 13 00:42:42 mds kernel: Lustre: mybackup-OST0000-osc-f27eac00:
Connection to service mybackup-OST0000 via nid 0 at lo was lost; in
progress operations using this service will wait for recovery to
complete.
Nov 13 00:42:42 mds kernel: LustreError: 167-0: This client was
evicted by mybackup-OST0000; in progress operations using this service
will fail.
Nov 13 00:42:42 mds kernel: LustreError: 2163:0:(ldlm_request.c:
996:ldlm_cli_cancel_req()) Got rc -5 from cancel RPC: canceling anyway
Nov 13 00:42:42 mds kernel: LustreError: 2163:0:(ldlm_request.c:
1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -5
Nov 13 00:42:42 mds kernel: LustreError: 2104:0:(client.c:
722:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at f22d2a00
x148007205/t0 o4->mybackup-OST0000_UUID at 192.168.10.14@tcp:6/4 lens
384/480 e 0 to 100 dl 0 ref 2 fl Rpc:/0/0 rc 0/0
Nov 13 00:42:42 mds kernel: LustreError: 2104:0:(client.c:
722:ptlrpc_import_delay_req()) Skipped 8 previous similar messages
Nov 13 00:42:42 mds kernel: Lustre: mybackup-OST0000-osc-f27eac00:
Connection restored to service mybackup-OST0000 using nid 0 at lo.

Thanks for any advice.



More information about the lustre-discuss mailing list