[lustre-discuss] Added OSTs, now lnet errors
Brett Lee
brettlee.lustre at gmail.com
Sun Dec 11 20:45:07 PST 2016
Hi Steve, You're welcome for the suggestion. I offered it as you
mentioned adding a couple new oss servers and noticing the entries in the
logs. Helpful to know would be where you are seeing the errors - new nodes
only, or ?? Generally, networks with existing problems seems to work ok at
low bandwidths, but problems start to appear as loads increase - hence the
suggestion to check the network for problems. A quick check could be made
with LNet self test between two different sets of nodes - set 1 nodes
indicate the problem, and set 2 do not. Best,
On Dec 11, 2016 6:05 PM, "Steve Barnet" <barnet at icecube.wisc.edu> wrote:
> Hi Brett,
>
>
> On 12/11/16 4:46 PM, Brett Lee wrote:
>
>> Steve, It might be the network that LNet is running on. Have you run
>> some bandwidth tests without LNet to check for network problems?
>>
>
>
> It's running over a 10Gb/s Ethernet network that is carrying
> other OSS traffic successfully. No routers or other fancy LNET
> features in play. However, it is quite possible that there are
> issues with the networking on the host side. Definitely on my
> list of things to test out.
>
> At this point, I'm just trying to narrow the search space.
> I didn't find anything particularly revealing when I searched
> around, so I'm hoping some expert eyes can shine a bit of
> light on the situation.
>
> Thanks for the tip!
>
> Best,
>
> ---Steve
>
>
>> On Dec 11, 2016 3:37 PM, "Steve Barnet" <barnet at icecube.wisc.edu
>> <mailto:barnet at icecube.wisc.edu>> wrote:
>>
>> Hi all,
>>
>> Seeing something very strange. I recently added two OSSes
>> and 10 OSTs to one of our filesystems. Things look OK under
>> light loads, but when we load them up, we start seeing lots
>> of LNet errors.
>>
>> OS: Scientific Linux 6.7
>> Lustre - Server: 2.8.0 Community version
>> Lustre - Client: 2.5.3
>>
>> The errors are below. Do these narrow the range of possible
>> problems?
>>
>>
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LNetError:
>> 7732:0:(socklnd_cb.c:2509:ksocknal_check_peer_timeouts()) Total 4
>> stale ZC_REQs for peer 10.128.10.29 at tcp1 detected; the
>> oldest(ffff880f6a90e000) timed out 7 secs ago, resid: 0, wmem: 0
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc ffff8805379f8000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc ffff880f375dc000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) @@@ network error on bulk
>> READ req at ffff880e506263c0 x1551187318090340/t0(0)
>> o3->092e941d-272a-09e3-502b-9338dbf387d3 at 10.128.10.29@tcp1:587/0
>> lens 488/432 e 3 to 0 dl 1481476687 ref 1 fl Interpret:/0/0 rc 0/0
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 8234:0:(ldlm_lib.c:3175:target_bulk_io()) Skipped 1 previous similar
>> message
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: Lustre: lfs2-OST0024: Bulk IO
>> read error with 092e941d-272a-09e3-502b-9338dbf387d3 (at
>> 10.128.10.29 at tcp1), client will retry: rc -110
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc ffff8804db0ce000
>> Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError:
>> 7732:0:(events.c:447:server_bulk_callback()) event type 5, status
>> -5, desc ffff880aa4374000
>>
>>
>> Thanks much!
>>
>> Best,
>>
>> ---Steve
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.l
>> ustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161211/ab43711b/attachment-0001.htm>
More information about the lustre-discuss
mailing list