[lustre-discuss] Added OSTs, now lnet errors

Steve Barnet barnet at icecube.wisc.edu
Sun Dec 11 14:37:47 PST 2016


Hi all,

   Seeing something very strange. I recently added two OSSes
and 10 OSTs to one of our filesystems. Things look OK under
light loads, but when we load them up, we start seeing lots
of LNet errors.

OS: Scientific Linux 6.7
Lustre - Server: 2.8.0 Community version
Lustre - Client: 2.5.3

The errors are below. Do these narrow the range of possible
problems?


Dec 11 11:17:39 lfs-ex-oss-20 kernel: LNetError: 
7732:0:(socklnd_cb.c:2509:ksocknal_check_peer_timeouts()) Total 4 stale 
ZC_REQs for peer 10.128.10.29 at tcp1 detected; the 
oldest(ffff880f6a90e000) timed out 7 secs ago, resid: 0, wmem: 0
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, 
desc ffff8805379f8000
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, 
desc ffff880f375dc000
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
8234:0:(ldlm_lib.c:3175:target_bulk_io()) @@@ network error on bulk READ 
  req at ffff880e506263c0 x1551187318090340/t0(0) 
o3->092e941d-272a-09e3-502b-9338dbf387d3 at 10.128.10.29@tcp1:587/0 lens 
488/432 e 3 to 0 dl 1481476687 ref 1 fl Interpret:/0/0 rc 0/0
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
8234:0:(ldlm_lib.c:3175:target_bulk_io()) Skipped 1 previous similar message
Dec 11 11:17:39 lfs-ex-oss-20 kernel: Lustre: lfs2-OST0024: Bulk IO read 
error with 092e941d-272a-09e3-502b-9338dbf387d3 (at 10.128.10.29 at tcp1), 
client will retry: rc -110
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, 
desc ffff8804db0ce000
Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: 
7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, 
desc ffff880aa4374000


Thanks much!

Best,

---Steve



More information about the lustre-discuss mailing list