[Lustre-discuss] Luster clients getting evicted

Tom.Wang Tom.Wang at Sun.COM
Mon Feb 11 12:04:05 PST 2008


Aaron Knister wrote:
> I'm having a similar issue with lustre 1.6.4.2 and infiniband. Under 
> load, the clients hand about every 10 minutes which is really bad for 
> a production machine. The only way to fix the hang is to reboot the 
> server. My users are getting extremely impatient :-/
>
> I see this on the clients-
>
> LustreError: 2814:0:(client.c:975:ptlrpc_expire_one_request()) @@@ 
> timeout (sent at 1202756629, 301s ago)  req at ffff8100af233600 
> x1796079/t0 o6->data-OST0000_UUID at 192.168.64.71@o2ib:28 lens 336/336 
> ref 1 fl Rpc:/0/0 rc 0/-22
It means OST could not response the request(unlink, o6) in 300 seconds, 
so client disconnect the import to OST and try to reconnect.
Does this disconnection always happened when do unlink ? Could you 
please post process trace and console msg of OST at that time?

Thanks
WangDi
> Lustre: data-OST0000-osc-ffff810139ce4800: Connection to service 
> data-OST0000 via nid 192.168.64.71 at o2ib was lost; in progress 
> operations using this service will wait for recovery to complete.
> LustreError: 11-0: an error occurred while communicating with 
> 192.168.64.71 at o2ib. The ost_connect operation failed with -16
> LustreError: 11-0: an error occurred while communicating with 
> 192.168.64.71 at o2ib. The ost_connect operation failed with -16
>
> I've increased the timeout to 300seconds and it has helped marginally.
>
> -Aaron
>

>
>
>
>
>




More information about the lustre-discuss mailing list