[lustre-discuss] Lustre 2.12.0 and locking problems

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Wed Mar 6 12:15:10 PST 2019


On 3/6/19 11:29 AM, Amir Shehata wrote:
> The reason for the load being split across the tcp and o2ib0 for the 
> 2.12 client, is because the MR code sees both interfaces and realizes 
> it can use both of them and so it does.
> To disable this behavior you can disable discovery on the 2.12 client. 
> I think that should just get the client to only use the single 
> interface it's told to.
thank you very much, this worked out well.
> We're currently working on a feature (UDSP) which will allow the 
> specification of a "preferred" network. In your case you can set the 
> o2ib to be the preferred network. It'll always be used unless it 
> becomes unavailable. You get two benefits this way: 1) your preference 
> is adhered to. 2) reliability, since the tcp network will be used if 
> the o2ib network becomes unavailable.this feature
this feature (UDSP) would e really great.
>
> Let me know if disabling discovery on your 2.12 clients work.

yes after disabling discovery on the client side, the situation is much 
better


thank you very much


>
> thanks
> amir
>
> On Tue, 5 Mar 2019 at 18:49, Riccardo Veraldi 
> <Riccardo.Veraldi at cnaf.infn.it <mailto:Riccardo.Veraldi at cnaf.infn.it>> 
> wrote:
>
>     Hello Amir i answer in-line
>
>     On 3/5/19 3:42 PM, Amir Shehata wrote:
>>     It looks like the ping is passing. Did you try it several times
>>     to make sure it always pings successfully?
>>
>>     The way it works is the MDS (2.12) discovers all the interfaces
>>     on the peer. There is a concept of the primary NID for the peer.
>>     That's the first interface configured on the peer. In your case
>>     it's the o2ib NID. So when you do lnetctl net show you'll see
>>     Primary NID: <nid>@o2ib.
>>
>>         - primary nid: 172.21.52.88 at o2ib
>>            Multi-Rail: True
>>            peer ni:
>>              - nid: 172.21.48.250 at tcp
>>                state: NA
>>              - nid: 172.21.52.88 at o2ib
>>                state: NA
>>              - nid: 172.21.48.250 at tcp1
>>                state: NA
>>              - nid: 172.21.48.250 at tcp2
>>                state: NA
>>
>>     On the MDS it uses the primary_nid to identify the peer. So you
>>     can ping using the Primary NID. LNet will resolve the Primary NID
>>     to the tcp NID. As you can see in the logs, it never actually
>>     talks over o2ib. It ends up talking to the peer on its TCP NID,
>>     which is what you want to do.
>>
>>     I think the problem you're seeing is caused by the combination of
>>     2.12 and 2.10.x.
>>     From what I understand your servers are 2.12 and your clients are
>>     2.10.x.
>     my clients are 2.10.5 but this problem arise also with one client
>     2.12.0, anyway the combination of 2.10.0 clients and 2.12.0 is not
>     working right
>>
>>     Can you try disabling dynamic discovery on your servers:
>>     lnetctl set discovery 0
>
>     I did this on the MDS and OSS. I did not disable discovery on the
>     client side.
>
>     now on the MDS side lnetctl peer show looks right.
>
>     Anyway on the client side where I have both IB and tcp if I write
>     on the lustre filesystem (OSS) what hapens is that the write
>     operation is splitte/load balanced between IB and tcp (Ethernet)
>     and I do not want this. I would like that only IB would be used
>     when the client writes data to the OSS. but both peer ni
>     (o2ib,tcp) are seen from the 2.12.0 client and traffic goes to
>     both of them thus reducing performances because IB is not fully
>     used. This does not happen with 2.10.5 client writing on the same
>     2.12.0 OSS
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190306/b4d5b772/attachment.html>


More information about the lustre-discuss mailing list