[Lustre-discuss] osc_brw_redo_request error on clients

James Robnett jrobnett at aoc.nrao.edu
Wed Feb 9 14:20:41 PST 2011


>> LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@
redo for recoverable error  req at ffff8101ae084000
>> x1358858531428366/t60136289752
>> o4->lustre-OST0004_UUID at 192.168.1.12@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297285890 ref 2 fl Interpret:R/0/0 rc 0/0
>
> One line before that there should be the actual RPC error specified that
we need to know what happened.

   Nope, just that error repeated:
Feb  9 03:19:26 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101aaa3dc00 x1358858525376456/t60135184183
o4->lustre-OST0007_UUID at 192.168.1.12@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297246810 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 03:29:56 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101aaa3d000 x1358858525468762/t60135184397
o4->lustre-OST0007_UUID at 192.168.1.12@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297247403 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 03:40:22 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101aaa3c400 x1358858525557912/t60135184598
o4->lustre-OST0007_UUID at 192.168.1.12@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297248029 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 03:51:18 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101afef2400 x1358858525655268/t60135392181
o4->lustre-OST0002_UUID at 192.168.1.11@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297248685 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 04:01:40 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101aaa3dc00 x1358858525738536/t60135185019
o4->lustre-OST0007_UUID at 192.168.1.12@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297249307 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 04:12:04 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101aaa3c400 x1358858525822214/t60135185246
o4->lustre-OST0007_UUID at 192.168.1.12@o2ib:6/4 lens 464/608 e 0 to 1 dl
1297249931 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 10:48:28 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101addda800 x1358858527540672/t60134973305
o4->lustre-OST0000_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297273752 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 10:49:51 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101adf3c800 x1358858527567804/t60134976801
o4->lustre-OST0000_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297273835 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 10:52:22 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101adddb000 x1358858527619100/t60134983332
o4->lustre-OST0000_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297273986 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 10:57:23 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101addda000 x1358858527728677/t60134998617
o4->lustre-OST0000_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297274250 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 11:07:23 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101adddbc00 x1358858527926588/t60135043030
o4->lustre-OST0000_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297274887 ref 2 fl Interpret:R/0/0 rc 0/0

   the above is from 'less /var/log/messages', not some false negative
by greping for osc_brw or lustre etc from the logs.

  In addition to the above I also see this sequence:

Feb  9 11:57:41 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo for
recoverable error  req at ffff8101add34000 x1358858528880660/t64430183909
o4->lustre-OST0001_UUID at 192.168.1.11@o2ib:6/4 lens 448/608 e 0 to 1 dl
1297277905 ref 2 fl Interpret:R/0/0 rc 0/0
 to 1 dl 1297278471 ref 2 fl Interpret:R/0/0 rc 0/0
Feb  9 12:07:42 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1629:osc_brw_redo_request()) Skipped 1000 previous si
milar messages
Feb  9 12:15:12 nm-post-2 kernel: LustreError:
400:0:(osc_request.c:1143:can_merge_pages()) is it ok to have flags 0xc20 a
nd 0x420 in the same brw?
Feb  9 12:15:12 nm-post-2 kernel: LustreError:
400:0:(osc_request.c:1143:can_merge_pages()) Skipped 43 previous similar me
ssages
Feb  9 12:15:50 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1143:can_merge_pages()) is it ok to have flags 0xc20
and 0x420 in the same brw?
Feb  9 12:15:50 nm-post-2 kernel: LustreError:
3935:0:(osc_request.c:1143:can_merge_pages()) Skipped 1 previous similar me
ssage

>> which in turn appears to generate a premature EOF on our user software.
>
> Actually this message does what it does - resends the request, so the
userspace should not notice
> any problems. On the other hand if any other requests aside from brw
requests fail, they might not
> get the resending benefit and cause userspace-visible errors.

   I glanced at the source and my initial impression was what you just
said,  that this is an internal retry, on the other hand there seems to be
a tight correlation between these messages and the user space EOF
occurrences.

   Thanks for the quick response.

james

> Bye,
>     Oleg
>







More information about the lustre-discuss mailing list