[Lustre-discuss] osc_brw_redo_request error on clients

James Robnett jrobnett at aoc.nrao.edu
Wed Feb 9 13:35:30 PST 2011


I have a fairly simple lustre environment that consists of a single MDS and
2 OSS's each with 4 OST's.  The servers and clients are all running Lustre
1.8.5 under RHEL 5.5,  RPM's downloaded from lustre.

Normally I've had no problems but recently I have multiple clients
reporting the following error:

LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo
for recoverable error  req at ffff8101ae084000 x1358858531428366/t60136289752
o4->lustre-OST0004_UUID at 192.168.1.12@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297285890 ref 2 fl Interpret:R/0/0 rc 0/0

which in turn appears to generate a premature EOF on our user software.

There are no corresponding errors on the servers.

I seem to only see this error on clients connected via QDR infiniband
though that may be a false lead.  In addition the problem seems more
prevalent under load.  Lastly it seems to be getting worse, almost as
if there's some garbage collection issue on the clients.

I've done some searching and don't see reports involving that routine.  It
seems like a timeout of some sort.  Any hints as to what this error
indicates as a problem ?

james




More information about the lustre-discuss mailing list