[Lustre-discuss] Luster clients getting evicted
Tom.Wang
Tom.Wang at Sun.COM
Mon Feb 11 12:04:05 PST 2008
Aaron Knister wrote:
> I'm having a similar issue with lustre 1.6.4.2 and infiniband. Under
> load, the clients hand about every 10 minutes which is really bad for
> a production machine. The only way to fix the hang is to reboot the
> server. My users are getting extremely impatient :-/
>
> I see this on the clients-
>
> LustreError: 2814:0:(client.c:975:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1202756629, 301s ago) req at ffff8100af233600
> x1796079/t0 o6->data-OST0000_UUID at 192.168.64.71@o2ib:28 lens 336/336
> ref 1 fl Rpc:/0/0 rc 0/-22
It means OST could not response the request(unlink, o6) in 300 seconds,
so client disconnect the import to OST and try to reconnect.
Does this disconnection always happened when do unlink ? Could you
please post process trace and console msg of OST at that time?
Thanks
WangDi
> Lustre: data-OST0000-osc-ffff810139ce4800: Connection to service
> data-OST0000 via nid 192.168.64.71 at o2ib was lost; in progress
> operations using this service will wait for recovery to complete.
> LustreError: 11-0: an error occurred while communicating with
> 192.168.64.71 at o2ib. The ost_connect operation failed with -16
> LustreError: 11-0: an error occurred while communicating with
> 192.168.64.71 at o2ib. The ost_connect operation failed with -16
>
> I've increased the timeout to 300seconds and it has helped marginally.
>
> -Aaron
>
>
>
>
>
>
More information about the lustre-discuss
mailing list