<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hello.</p>
<p>Here I am again trying to have multi-rail work.</p>
<p>I configured multi-rail on OSS and clients side.</p>
<p>I have one OSS, one MDS and one client, RHEL74 and Lustre 2.10.1:<br>
</p>
<ul>
<li>psdrp-tst-mds10 MDS<br>
</li>
<li>drp-tst-oss10 OSS (172.21.52.86@o2ib 172.21.52.118@o2ib)<br>
</li>
<li>drp-tst-lu10 Lustre client (172.21.52.124@o2ib
172.21.52.125@o2ib)</li>
</ul>
<p>without Multi-Rail everything works fine.</p>
<p>What I Am doing is to aggregate two IB interface to being able to
have more performance. When anyway I mount the lustre partition
from the Lsutre client I got this error and the partition does not
mount:</p>
<p>Oct 9 16:23:50 drp-tst-lu10 kernel: [248177.914832] LNetError:
1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.118@o2ib
rejected: consumer defined fatal error<br>
Oct 9 16:23:50 drp-tst-lu10 kernel: [248177.917290] Lustre:
Mounted drplu-client<br>
Oct 9 16:23:50 drp-tst-lu10 kernel: [248177.920832] Lustre:
31785:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
sent has failed due to network error: [sent 1507591430/real
1507591430] req@ffff8807f56a0300 x1580812428378832/t0(0)
o8-><a class="moz-txt-link-abbreviated" href="mailto:drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4">drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4</a>
lens 520/544 e 0 to 1 dl 1507591435 ref 1 fl Rpc:eXN/0/ffffffff rc
0/-1<br>
Oct 9 16:23:52 drp-tst-lu10 kernel: [248179.936156] LustreError:
673:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
= -5<br>
Oct 9 16:23:57 drp-tst-lu10 kernel: [248184.645463] LustreError:
674:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
= -5<br>
Oct 9 16:23:58 drp-tst-lu10 kernel: [248186.117364] LustreError:
678:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
= -5<br>
Oct 9 16:23:58 drp-tst-lu10 kernel: [248186.117411] LustreError:
678:0:(llite_lib.c:1748:ll_statfs_internal()) Skipped 1 previous
similar message<br>
Oct 9 16:24:15 drp-tst-lu10 kernel: [248202.912554] LNetError:
1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.118@o2ib
rejected: consumer defined fatal error<br>
Oct 9 16:24:15 drp-tst-lu10 kernel: [248202.912610] LNetError:
1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) Skipped 3 previous
similar messages<br>
Oct 9 16:24:15 drp-tst-lu10 kernel: [248202.918903] Lustre:
31785:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
sent has failed due to network error: [sent 1507591455/real
1507591455] req@ffff88075d2ee700 x1580812428378960/t0(0)
o8-><a class="moz-txt-link-abbreviated" href="mailto:drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4">drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4</a>
lens 520/544 e 0 to 1 dl 1507591465 ref 1 fl Rpc:eXN/0/ffffffff rc
0/-1<br>
Oct 9 16:23:52 drp-tst-lu10 kernel: [248179.936156] LustreError:
673:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
= -5<br>
<br>
</p>
<p>fstab entry: 172.21.42.213@tcp:/drplu /drplu lustre
noauto,lazystatfs,flock, 0 0<br>
</p>
<p>I can see the peers in the lnet status:</p>
<p>[root@drp-tst-oss10:~]# cat /proc/sys/lnet/peers <br>
nid refs state last max rtr min
tx min queue<br>
172.21.52.124@o2ib 1 NA -1 128 128 128
128 128 0<br>
172.21.52.125@o2ib 1 NA -1 128 128 128
128 128 0<br>
172.21.42.213@tcp 1 NA -1 8 8 8
8 6 0<br>
<br>
<br>
<br>
[root@drp-tst-lu10:etc]# cat /proc/sys/lnet/peers <br>
nid refs state last max rtr min
tx min queue<br>
172.21.52.118@o2ib 1 NA -1 128 128 128
128 127 0<br>
172.21.52.86@o2ib 1 NA -1 128 128 128
128 102 0<br>
172.21.42.213@tcp 1 NA -1 8 8 8
8 6 0<br>
</p>
<p><br>
</p>
<p>here is my lnet configuration with multi-rail on the OSS side<br>
</p>
<p><br>
</p>
<p>[root@drp-tst-oss10:veraldi]# lnetctl export<br>
net:<br>
- net type: lo<br>
local NI(s):<br>
- nid: 0@lo<br>
status: up<br>
statistics:<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 0<br>
peer_credits: 0<br>
peer_buffer_credits: 0<br>
credits: 0<br>
lnd tunables:<br>
tcp bonding: 0<br>
dev cpt: 0<br>
CPT: "[0,1]"<br>
- net type: o2ib<br>
local NI(s):<br>
- nid: 172.21.52.86@o2ib<br>
status: up<br>
interfaces:<br>
0: ib0<br>
statistics:<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 128<br>
peer_buffer_credits: 0<br>
credits: 1024<br>
lnd tunables:<br>
peercredits_hiw: 64<br>
map_on_demand: 32<br>
concurrent_sends: 256<br>
fmr_pool_size: 2048<br>
fmr_flush_trigger: 512<br>
fmr_cache: 1<br>
ntx: 2048<br>
conns_per_peer: 4<br>
tcp bonding: 0<br>
dev cpt: 1<br>
CPT: "[0,1]"<br>
- nid: 172.21.52.118@o2ib<br>
status: up<br>
interfaces:<br>
0: ib1<br>
statistics:<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 128<br>
peer_buffer_credits: 0<br>
credits: 1024<br>
lnd tunables:<br>
peercredits_hiw: 64<br>
map_on_demand: 32<br>
concurrent_sends: 256<br>
fmr_pool_size: 2048<br>
fmr_flush_trigger: 512<br>
fmr_cache: 1<br>
ntx: 2048<br>
conns_per_peer: 4<br>
tcp bonding: 0<br>
dev cpt: 1<br>
CPT: "[0,1]"<br>
- net type: tcp<br>
local NI(s):<br>
- nid: 172.21.42.211@tcp<br>
status: up<br>
interfaces:<br>
0: enp1s0f0<br>
statistics:<br>
send_count: 198<br>
recv_count: 198<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 8<br>
peer_buffer_credits: 0<br>
credits: 256<br>
lnd tunables:<br>
tcp bonding: 0<br>
dev cpt: 0<br>
CPT: "[0,1]"<br>
peer:<br>
- primary nid: 172.21.42.213@tcp<br>
Multi-Rail: True<br>
peer ni:<br>
- nid: 172.21.42.213@tcp<br>
state: NA<br>
max_ni_tx_credits: 8<br>
available_tx_credits: 8<br>
min_tx_credits: 6<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 8<br>
min_rtr_credits: 8<br>
send_count: 198<br>
recv_count: 198<br>
drop_count: 0<br>
refcount: 1<br>
- primary nid: 172.21.52.124@o2ib<br>
Multi-Rail: True<br>
peer ni:<br>
- nid: 172.21.52.124@o2ib<br>
state: NA<br>
max_ni_tx_credits: 128<br>
available_tx_credits: 128<br>
min_tx_credits: 128<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 128<br>
min_rtr_credits: 128<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
refcount: 1<br>
- nid: 172.21.52.125@o2ib<br>
state: NA<br>
max_ni_tx_credits: 128<br>
available_tx_credits: 128<br>
min_tx_credits: 128<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 128<br>
min_rtr_credits: 128<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
refcount: 1<br>
numa:<br>
range: 0</p>
<p><br>
</p>
<p><br>
</p>
<p>here the lnet configuration client side:</p>
<p><br>
</p>
<p>[root@drp-tst-lu10:veraldi]# lnetctl export<br>
net:<br>
- net type: lo<br>
local NI(s):<br>
- nid: 0@lo<br>
status: up<br>
statistics:<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 0<br>
peer_credits: 0<br>
peer_buffer_credits: 0<br>
credits: 0<br>
lnd tunables:<br>
tcp bonding: 0<br>
dev cpt: 0<br>
CPT: "[0]"<br>
- net type: o2ib<br>
local NI(s):<br>
- nid: 172.21.52.124@o2ib<br>
status: up<br>
interfaces:<br>
0: ib0<br>
statistics:<br>
send_count: 403742<br>
recv_count: 807391<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 128<br>
peer_buffer_credits: 0<br>
credits: 1024<br>
lnd tunables:<br>
peercredits_hiw: 64<br>
map_on_demand: 32<br>
concurrent_sends: 256<br>
fmr_pool_size: 2048<br>
fmr_flush_trigger: 512<br>
fmr_cache: 1<br>
ntx: 2048<br>
conns_per_peer: 4<br>
tcp bonding: 0<br>
dev cpt: -1<br>
CPT: "[0]"<br>
- nid: 172.21.52.125@o2ib<br>
status: up<br>
interfaces:<br>
0: ib1<br>
statistics:<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 128<br>
peer_buffer_credits: 0<br>
credits: 1024<br>
lnd tunables:<br>
peercredits_hiw: 64<br>
map_on_demand: 32<br>
concurrent_sends: 256<br>
fmr_pool_size: 2048<br>
fmr_flush_trigger: 512<br>
fmr_cache: 1<br>
ntx: 2048<br>
conns_per_peer: 4<br>
tcp bonding: 0<br>
dev cpt: -1<br>
CPT: "[0]"<br>
- net type: tcp<br>
local NI(s):<br>
- nid: 172.21.42.195@tcp<br>
status: up<br>
interfaces:<br>
0: enp7s0f0<br>
statistics:<br>
send_count: 99<br>
recv_count: 99<br>
drop_count: 0<br>
tunables:<br>
peer_timeout: 180<br>
peer_credits: 8<br>
peer_buffer_credits: 0<br>
credits: 256<br>
lnd tunables:<br>
tcp bonding: 0<br>
dev cpt: -1<br>
CPT: "[0]"<br>
peer:<br>
- primary nid: 172.21.42.213@tcp<br>
Multi-Rail: True<br>
peer ni:<br>
- nid: 172.21.42.213@tcp<br>
state: NA<br>
max_ni_tx_credits: 8<br>
available_tx_credits: 8<br>
min_tx_credits: 6<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 8<br>
min_rtr_credits: 8<br>
send_count: 99<br>
recv_count: 99<br>
drop_count: 0<br>
refcount: 1<br>
- primary nid: 172.21.52.86@o2ib<br>
Multi-Rail: True<br>
peer ni:<br>
- nid: 172.21.52.86@o2ib<br>
state: NA<br>
max_ni_tx_credits: 128<br>
available_tx_credits: 128<br>
min_tx_credits: 102<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 128<br>
min_rtr_credits: 128<br>
send_count: 403742<br>
recv_count: 807391<br>
drop_count: 0<br>
refcount: 1<br>
- nid: 172.21.52.118@o2ib<br>
state: NA<br>
max_ni_tx_credits: 128<br>
available_tx_credits: 128<br>
min_tx_credits: 127<br>
tx_q_num_of_buf: 0<br>
available_rtr_credits: 128<br>
min_rtr_credits: 128<br>
send_count: 0<br>
recv_count: 0<br>
drop_count: 0<br>
refcount: 1<br>
numa:<br>
range: 0</p>
<p><br>
</p>
<p>anyway Lustre does not work. This is really weird. it should.</p>
<p>Any hints ?</p>
<p>thank you</p>
<p><br>
</p>
<p>Rick</p>
<p><br>
</p>
</body>
</html>