[Lustre-discuss] OST I/O problems

Heiko Schröter schroete at iup.physik.uni-bremen.de
Mon Dec 7 02:09:41 PST 2009


Am Samstag 05 Dezember 2009 04:59:55 schrieb Andreas Dilger:
> 
> We've had problems in the past with 3ware controllers at other sites  
> in the past - the performance is not as good as expected, since they  
> rely heavily on readahead to get good performance.

True. But we had serious problems (incl data loss) with adaptec controllers before.
The performance is superb but usability and maintenance is a nightmare in a Linux environment.

> 
> That said:
> 
> > Dec  4 12:42:56 sadosrd24 LustreError: 4744:0:(ost_handler.c: 
> > 882:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s   
> > req at ffff81007efa7e00 x7869690/t0 o3->eb2e7e64-c1d9- 
> > d1f6-8f9d-1ba9629ff4c0 at NET_0x20000c0a8106f_UUID:0/0 lens 384/336 e 0  
> > to 0 dl 1259926976 ref 1 fl Interpret:/0/0 rc 0/0
> 
> This means that the IO didn't complete before the timeout.  This could  
> be because the OST IO is so slow that no RPC can complete before the  
> timeout, or because there is packet loss.

In our case a misconfigured D-Link switch caused the problems. So we bought the 'packet loss' option in our case.

> 
> Some things to try:
> - reduce the number of OSS threads via module parameter:
>    option ost oss_num_threads=N
> - increase the lustre timeout (details in the manual)

Thank you very much for your help showing the direction together with Bernd Schubert.

Best Regards
Heiko



More information about the lustre-discuss mailing list