[Lustre-discuss] OST I/O problems
schroete at iup.physik.uni-bremen.de
Mon Dec 7 02:09:41 PST 2009
Am Samstag 05 Dezember 2009 04:59:55 schrieb Andreas Dilger:
> We've had problems in the past with 3ware controllers at other sites
> in the past - the performance is not as good as expected, since they
> rely heavily on readahead to get good performance.
True. But we had serious problems (incl data loss) with adaptec controllers before.
The performance is superb but usability and maintenance is a nightmare in a Linux environment.
> That said:
> > Dec 4 12:42:56 sadosrd24 LustreError: 4744:0:(ost_handler.c:
> > 882:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s
> > req at ffff81007efa7e00 x7869690/t0 o3->eb2e7e64-c1d9-
> > d1f6-8f9d-1ba9629ff4c0 at NET_0x20000c0a8106f_UUID:0/0 lens 384/336 e 0
> > to 0 dl 1259926976 ref 1 fl Interpret:/0/0 rc 0/0
> This means that the IO didn't complete before the timeout. This could
> be because the OST IO is so slow that no RPC can complete before the
> timeout, or because there is packet loss.
In our case a misconfigured D-Link switch caused the problems. So we bought the 'packet loss' option in our case.
> Some things to try:
> - reduce the number of OSS threads via module parameter:
> option ost oss_num_threads=N
> - increase the lustre timeout (details in the manual)
Thank you very much for your help showing the direction together with Bernd Schubert.
More information about the lustre-discuss