[lustre-discuss] Experience with DDN AI400X

Ms. Megan Larko dobsonunit at gmail.com
Tue Apr 6 08:28:00 PDT 2021


Hello Folks,

To clarify my own issues with working with both Lustre server 2.12.5 and
LNet routers at 2.10.4, I have in my notes from October 2020 that I
received many, many lines in /var/log/messages reading:
 LNet: 8759:0 (o2iblnd_cb.c:3401:kiblnd_check_conns()) Timed out tx for <IP
IB addr> 56 seconds
 which was followed by
Skipped 97 previous similar messages.

The behavior of the Lustre File System storage was a bit (noticeably)
slower when traversing LNets and clients at 2.10.4.   Now I will note that
the 2.10.4 Lustre clients were built with Mellanox version 4.3-1.0.1 and
the Lustre 2.12.5 servers are using Mellanox OFED version 4.7-1.0.0.
 These were the versions of Mellanox software applied when the boxes were
built.   I did not investigate the "Timed out tx for <IP IB addr>"  I only
noticed that it was consistent for me with 2.12.6 Lustre servers and LNet
routers at 2.10.4 (with the corresponding Mellanox OFED).  I eliminated the
obvious performance issue and messages by not using LNet routers with LFS
2.10.4/MOFED and going with an LNet router at Lustre client 2.12.2 or newer
where the Lustre client 2.12.2 is using Mellanox OFED 4.5-1.0.1.0.

That is why I made the comment that Lustre 2.10.x may not play well with
newer 2.12.x.   It well could be that the differenced in the MOFED stack
are more the reason than Lustre software itself.  Apologies if I offended.
  I'm glad other people have had better luck with Lustre 2.10.x and Lustre
2.12.x versions.

Cheers,
megan

P.S.  Sorry for delay in response; I was off for a few days.

On Fri, Apr 2, 2021 at 5:02 AM Andreas Dilger <adilger at whamcloud.com> wrote:

> On Mar 30, 2021, at 11:54, Spitz, Cory James via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
>
> Hello, Megan.
>
> I was curious why you made this comment:
> > A general example is a box with lustre-client 2.10.4 is not going to be
> completely happy with a new 2.12.x on the lustre network
> In general, I think that the two LTS release are very interoperable.  What
> incompatibility are you referring to?  Do you have a well-known LU or two
> to share?
>
>
> This could potentially relate to changes with configuring Multi-Rail LNet
> between those releases?
>
>
> On 3/30/21, 12:14 PM, "lustre-discuss on behalf of Ms. Megan Larko via
> lustre-discuss" <lustre-discuss-bounces at lists.lustre.org on behalf of
> lustre-discuss at lists.lustre.org> wrote:
>
> Hello!
>
> I have no direct experience with the DDN AI400X, but as a vendor DDN has
> some nice value-add to the Lustre systems they build.  Having worked with
> other DDN Lustre hw in my career, interoperability with other Lustre mounts
> is usually not an issue unless the current lustre-client software on the
> client boxes is a very different software version or network stack.  A
> general example is a box with lustre-client 2.10.4 is not going to be
> completely happy with a new 2.12.x on the lustre network.  As far as vendor
> lock-in, DDN support in my past experience does have its own value-add to
> their Lustre storage product so it is not completely vanilla.  I have found
> the enhancements useful.  As far as your total admin control of the DDN
> storage product, that is probably up to the terms of the service agreement
> made with purchase.   My one experience with DDN on that is contractually
> DDN maintained the box version level and patches, standard Lustre tunables
> were fine for local admins.  In one case we did stumble upon a bug, I was
> permitted to dig around freely but not to change anything; I shared my
> findings with the DDN team.  It worked out well for us.
>
> P.S.  I am not in any way employed or compensated by DDN.    I'm just
> sharing my own experience.   Smile.
>
> Cheers,
> megan
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210406/85c3bee9/attachment.html>


More information about the lustre-discuss mailing list