<div dir="ltr"><div>no problem</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 6 Mar 2019 at 12:15, Riccardo Veraldi <<a href="mailto:Riccardo.Veraldi@cnaf.infn.it">Riccardo.Veraldi@cnaf.infn.it</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_5983301634755437508moz-cite-prefix">On 3/6/19 11:29 AM, Amir Shehata wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>The reason for the load being split across the tcp and
o2ib0 for the 2.12 client, is because the MR code sees both
interfaces and realizes it can use both of them and so it
does.</div>
<div>To disable this behavior you can disable discovery on the
2.12 client. I think that should just get the client to only
use the single interface it's told to.</div>
</div>
</blockquote>
thank you very much, this worked out well.<br>
<blockquote type="cite">
<div dir="ltr">
<div>We're currently working on a feature (UDSP) which will
allow the specification of a "preferred" network. In your case
you can set the o2ib to be the preferred network. It'll always
be used unless it becomes unavailable. You get two benefits
this way: 1) your preference is adhered to. 2) reliability,
since the tcp network will be used if the o2ib network becomes
unavailable.this feature <br>
</div>
</div>
</blockquote>
this feature (UDSP) would e really great.<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Let me know if disabling discovery on your 2.12 clients
work.</div>
</div>
</blockquote>
<p>yes after disabling discovery on the client side, the situation
is much better</p>
<p><br>
</p>
<p>thank you very much</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>thanks</div>
<div>amir<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 5 Mar 2019 at 18:49,
Riccardo Veraldi <<a href="mailto:Riccardo.Veraldi@cnaf.infn.it" target="_blank">Riccardo.Veraldi@cnaf.infn.it</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_5983301634755437508gmail-m_5281618942611277454moz-cite-prefix">Hello
Amir i answer in-line</div>
<div class="gmail-m_5983301634755437508gmail-m_5281618942611277454moz-cite-prefix"><br>
</div>
<div class="gmail-m_5983301634755437508gmail-m_5281618942611277454moz-cite-prefix">On
3/5/19 3:42 PM, Amir Shehata wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>It looks like the ping is passing. Did you try it
several times to make sure it always pings
successfully?</div>
<div><br>
</div>
<div>The way it works is the MDS (2.12) discovers all
the interfaces on the peer. There is a concept of the
primary NID for the peer. That's the first interface
configured on the peer. In your case it's the o2ib
NID. So when you do lnetctl net show you'll see
Primary NID: <nid>@o2ib.</div>
<div><br>
</div>
<div> - primary nid: 172.21.52.88@o2ib<br>
Multi-Rail: True<br>
peer ni:<br>
- nid: 172.21.48.250@tcp<br>
state: NA<br>
- nid: 172.21.52.88@o2ib<br>
state: NA<br>
- nid: 172.21.48.250@tcp1<br>
state: NA<br>
- nid: 172.21.48.250@tcp2<br>
state: NA</div>
<div><br>
</div>
<div>On the MDS it uses the primary_nid to identify the
peer. So you can ping using the Primary NID. LNet will
resolve the Primary NID to the tcp NID. As you can see
in the logs, it never actually talks over o2ib. It
ends up talking to the peer on its TCP NID, which is
what you want to do.</div>
<div><br>
</div>
<div>I think the problem you're seeing is caused by the
combination of 2.12 and 2.10.x.</div>
<div>From what I understand your servers are 2.12 and
your clients are 2.10.x. <br>
</div>
</div>
</blockquote>
my clients are 2.10.5 but this problem arise also with one
client 2.12.0, anyway the combination of 2.10.0 clients and
2.12.0 is not working right<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Can you try disabling dynamic discovery on your
servers:</div>
<div>lnetctl set discovery 0</div>
</div>
</blockquote>
<p>I did this on the MDS and OSS. I did not disable
discovery on the client side.</p>
<p>now on the MDS side lnetctl peer show looks right.</p>
<p>Anyway on the client side where I have both IB and tcp if
I write on the lustre filesystem (OSS) what hapens is that
the write operation is splitte/load balanced between IB
and tcp (Ethernet) and I do not want this. I would like
that only IB would be used when the client writes data to
the OSS. but both peer ni (o2ib,tcp) are seen from the
2.12.0 client and traffic goes to both of them thus
reducing performances because IB is not fully used. This
does not happen with 2.10.5 client writing on the same
2.12.0 OSS<br>
</p>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote></div>