[Lustre-discuss] Typical IB timeout? Or something more?

Cliff White Cliff.White at Sun.COM
Tue Sep 9 14:36:58 PDT 2008


Alex Lee wrote:
> I been seeing something that looks like IB timeout errors lately after 
> upgrading to 1.6.5.1 using the supplied ofed kernel drivers.
> 
>  From what I can tell there hasnt been any real network issues that was 
> apparent. Are these errors just typical if the network is busy?

Could be. If you are having actual IB issues, there is generally an 
error from IB prior to any Lustre errors. (And generally lots of 
LustreErrors - we really get unhappy when your network breaks) What we 
see here are two requests timing out, one with a 6sec limit, one 20 sec. 
No indication of a related error, just a timeout exceeded.

These timeouts can happen on a busy network. If they are frequent, you 
should increase obd_timeout.
cliffw

> 
> 
> Heres from MDS:
> Sep  8 16:04:19 lustre-mds-0-0 kernel: LustreError: 
> 10001:0:(events.c:55:request_out_callback()) @@@ ty
> pe 4, status -5  req at ffff81010ffb7200 x15310157/t0 
> o104-><F0>'<84><88><FF><FF><FF><FF>^P at NET_0x500000a0
> c1d08_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 2 fl Rpc:N/0/0 
> rc 0/0
> Sep  8 16:04:19 lustre-mds-0-0 kernel: Lustre: Request x15310157 sent 
> from lfs-MDT0000 to NID 10.12.29.
> 8 at o2ib 2s ago has timed out (limit 6s).
> Sep  8 16:04:23 lustre-mds-0-0 kernel: Lustre: Request x15310047 sent 
> from lfs-MDT0000 to NID 10.12.28.
> 235 at o2ib 6s ago has timed out (limit 6s).
> Sep  8 16:04:43 lustre-mds-0-0 kernel: LustreError: 
> 10003:0:(events.c:55:request_out_callback()) @@@ ty
> pe 4, status -5  req at ffff8100638cd400 x15310096/t0 
> o104-><F0>'<84><88><FF><FF><FF><FF>^P at NET_0x500000a0
> c1d0e_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 1 fl 
> Complete:XN/0/0 rc 0/0
> Sep  8 16:05:07 lustre-mds-0-0 kernel: LustreError: 
> 3930:0:(events.c:55:request_out_callback()) @@@ typ
> e 4, status -113  req at ffff8101672a8c00 x15310047/t0 
> o104-><F0>'<84><88><FF><FF><FF><FF>^P at NET_0x500000a
> 0c1ceb_UUID:15/16 lens 232/256 e 0 to 6 dl 1220857463 ref 1 fl 
> Complete:XN/0/0 rc 0/0
> Sep  8 16:08:44 lustre-mds-0-0 kernel: Lustre: Skipped 1 previous 
> similar message
> Sep  8 16:13:24 lustre-mds-0-0 kernel: Lustre: Skipped 4 previous 
> similar messages
> 
> On the OSS:
> Sep  9 00:24:55 lustre-oss-4-1 kernel: Lustre: Skipped 3 previous 
> similar messages
> Sep  9 00:25:01 lustre-oss-4-1 kernel: Lustre: Request x784766 sent from 
> lfs-OST0039 to NID 10.12.29.7@
> o2ib 20s ago has timed out (limit 20s).
> Sep  9 00:25:31 lustre-oss-4-1 kernel: LustreError: 
> 13228:0:(o2iblnd_cb.c:2874:kiblnd_check_conns()) Ti
> med out RDMA with 10.12.29.7 at o2ib
> Sep  9 00:25:31 lustre-oss-4-1 kernel: LustreError: 
> 13228:0:(events.c:55:request_out_callback()) @@@ ty
> pe 4, status -103  req at ffff8102d9b49c00 x784766/t0 o104->@:15/16 lens 
> 232/256 e 0 to 20 dl 1220887501 r
> ef 1 fl Complete:XN/0/0 rc 0/0
> 
> -Alex
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list