[lustre-discuss] RDMA too many fragments/timed out - clients slowing entire filesystem performance

Brian W. Johanson bjohanso at psc.edu
Fri Nov 11 13:37:02 PST 2016


is this patch dependent on any others for 2.8.0?  It didn't go cleanly but I 
believe I got it in without problems. It fails outright with the same errors. 
wrq_sqe=2 is set.

I'm pretty sure I had this one applied when I was trying routing OPA-mlx5 and 
experienced the same but I haven't gotten back to that effort yet.


On 11/01/2016 08:18 PM, Brian W. Johanson wrote:
>
> Great, thanks Doug!
>
> Quotas are not enabled.
>
> There are a few nodes that were exhibiting the issue fairly consistently.  We 
> have recently added  70 clients (~900 total) which seems to have caused this 
> to happen more frequently.
>
> -b
>
>
> On 11/01/2016 07:57 PM, Oucharek, Doug S wrote:
>> Hi Brian,
>>
>> You need this patch: http://review.whamcloud.com/#/c/12451.  It has not 
>> landed to master yet and is off by default.  To activate it, add this module 
>> parameter line to your nodes (all of them):
>>
>> options ko2iblnd wrq_sge=2
>>
>> The issue is that something is causing an offset to be introduced to the bulk 
>> transfers.  That causes a misalignment of the source and destination 
>> fragments.  Due to how the algorithm works, you require twice as many 
>> descriptors to the fragments to do the RDMA operation.  So, you are running 
>> out of descriptors when you are only halfway done configuring the transfer. 
>>  The above patch creates two sets of descriptors so the second set can be 
>> utilized in situations like this.  The fix operates on the nodes which are 
>> doing the bulk transfers.  Since you can both read and write bulk data, you 
>> need the fix on server, clients, and LNet routers (basically, everywhere).
>>
>> Question: are you using the quotas feature and could it be at or approaching 
>> a limit?  There has been some evidence that the quotas feature could be 
>> introducing the offset to bulk transfers.
>>
>> Doug
>>
>>> On Nov 1, 2016, at 4:08 PM, Brian W. Johanson <bjohanso at psc.edu 
>>> <mailto:bjohanso at psc.edu>> wrote:
>>>
>>> Centos 7.2
>>> Lustre 2.8.0
>>> ZFS 0.6.5.5
>>> OPA 10.2.0.0.158
>>>
>>>
>>> The clients and servers are on the same OPA network, no routing.  Once a 
>>> client gets in this state, the filesystem performance drops to a faction of 
>>> what it is capable of.
>>> The client must be rebooted to clear the issue.
>>>
>>> I imagine I am missing a bug in jira for this issue, does this look like a 
>>> known issue?
>>>
>>>
>>> Pertinent debug messages from the server:
>>>
>>> 00000800:00020000:34.0:1478026118.277782:0:29892:0:(o2iblnd_cb.c:3109:kiblnd_check_txs_locked()) 
>>> Timed out tx: active_txs, 4 seconds
>>> 00000800:00020000:34.0:1478026118.277785:0:29892:0:(o2iblnd_cb.c:3172:kiblnd_check_conns()) 
>>> Timed out RDMA with 10.4.119.112 at o2ib (3): c: 112, oc: 0, rc: 66
>>> 00000800:00000100:34.0:1478026118.277787:0:29892:0:(o2iblnd_cb.c:1913:kiblnd_close_conn_locked()) 
>>> Closing conn to 10.4.119.112 at o2ib: error -110(waiting)
>>> 00000100:00020000:34.0:1478026118.277844:0:29892:0:(events.c:447:server_bulk_callback()) 
>>> event type 5, status -103, desc ffff883e8e8bcc00
>>> 00000100:00020000:34.0:1478026118.288714:0:29892:0:(events.c:447:server_bulk_callback()) 
>>> event type 3, status -103, desc ffff883e8e8bcc00
>>> 00000100:00020000:34.0:1478026118.299574:0:29892:0:(events.c:447:server_bulk_callback()) 
>>> event type 5, status -103, desc ffff8810e92e9c00
>>> 00000100:00020000:34.0:1478026118.310434:0:29892:0:(events.c:447:server_bulk_callback()) 
>>> event type 3, status -103, desc ffff8810e92e9c00
>>>
>>>
>>> And from the client:
>>>
>>> 00000400:00000100:8.0:1477949860.565777:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949860.565782:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000800:00020000:8.0:1477949860.702666:0:3629:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) 
>>> RDMA has too many fragments for peer 10.4.108.81 at o2ib (32), src idx/frags: 
>>> 16/27 dst idx/frags: 16/27
>>> 00000800:00020000:8.0:1477949860.702667:0:3629:0:(o2iblnd_cb.c:1689:kiblnd_reply()) 
>>> Can't setup rdma for GET from 10.4.108.81 at o2ib: -90
>>> 00000100:00020000:8.0:1477949860.702669:0:3629:0:(events.c:201:client_bulk_callback()) 
>>> event type 1, status -5, desc ffff880fd5d9bc00
>>> 00000800:00020000:8.0:1477949860.816666:0:3629:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) 
>>> RDMA has too many fragments for peer 10.4.108.81 at o2ib (32), src idx/frags: 
>>> 16/27 dst idx/frags: 16/27
>>> 00000800:00020000:8.0:1477949860.816668:0:3629:0:(o2iblnd_cb.c:1689:kiblnd_reply()) 
>>> Can't setup rdma for GET from 10.4.108.81 at o2ib: -90
>>> 00000100:00020000:8.0:1477949860.816669:0:3629:0:(events.c:201:client_bulk_callback()) 
>>> event type 1, status -5, desc ffff880fd5d9bc00
>>> 00000400:00000100:8.0:1477949861.573660:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573664:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573667:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573669:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573671:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573673:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573675:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000400:00000100:8.0:1477949861.573677:0:3629:0:(lib-move.c:1489:lnet_parse_put()) 
>>> Dropping PUT from 12345-10.4.108.81 at o2ib portal 4 match 1549728742532740 
>>> offset 0 length 192: 4
>>> 00000800:00020000:8.0:1477949861.721668:0:3629:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) 
>>> RDMA has too many fragments for peer 10.4.108.81 at o2ib (32), src idx/frags: 
>>> 16/27 dst idx/frags: 16/27
>>> 00000800:00020000:8.0:1477949861.721669:0:3629:0:(o2iblnd_cb.c:1689:kiblnd_reply()) 
>>> Can't setup rdma for GET from 10.4.108.81 at o2ib: -90
>>> 00000100:00020000:8.0:1477949861.721670:0:3629:0:(events.c:201:client_bulk_callback()) 
>>> event type 1, status -5, desc ffff880fd5d9bc00
>>> 00000800:00020000:8.0:1477949861.836668:0:3629:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) 
>>> RDMA has too many fragments for peer 10.4.108.81 at o2ib (32), src idx/frags: 
>>> 16/27 dst idx/frags: 16/27
>>> 00000800:00020000:8.0:1477949861.836669:0:3629:0:(o2iblnd_cb.c:1689:kiblnd_reply()) 
>>> Can't setup rdma for GET from 10.4.108.81 at o2ib: -90
>>> 00000100:00020000:8.0:1477949861.836670:0:3629:0:(events.c:201:client_bulk_callback()) 
>>> event type 1, status -5, desc ffff880fd5d9bc00
>>> 00000800:00020000:8.0:1477949862.061668:0:3629:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) 
>>> RDMA has too many fragments for peer 10.4.108.81 at o2ib (32), src idx/frags: 
>>> 16/27 dst idx/frags: 16/27
>>> 00000800:00020000:8.0:1477949862.061669:0:3629:0:(o2iblnd_cb.c:1689:kiblnd_reply()) 
>>> Can't setup rdma for GET from 10.4.108.81 at o2ib: -90
>>> 00000100:00020000:8.0:1477949862.061670:0:3629:0:(events.c:201:client_bulk_callback()) 
>>> event type 1, status -5, desc ffff880fd5d9bc00
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161111/8b6fd473/attachment.htm>


More information about the lustre-discuss mailing list