[lustre-discuss] mlx4 and mxl5 mix environment

Ms. Megan Larko dobsonunit at gmail.com
Wed Jul 1 07:39:27 PDT 2020


Awesome, thanks!   Unfortunately the password reset site is not finding my
UID.   Maybe I never had access to the Lustre wiki.  (I have so many
accounts that sometimes my head spins.)   I'm still willing to help.  Is
there a request password site?

Cheers,
megan

On Fri, Jun 26, 2020 at 8:54 PM Spitz, Cory James <cory.spitz at hpe.com>
wrote:

> Megan,
>
>
>
> You wrote:
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
>
>
> Thank you for your offer!  Did you try
> http://wiki.lustre.org/Special:PasswordReset?  If that didn’t work then I
> think that you could email lustre.org at lists.opensfs.org.
>
>
>
> -Cory
>
>
>
>
>
>
>
> On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" <
> lustre-discuss-bounces at lists.lustre.org on behalf of dobsonunit at gmail.com>
> wrote:
>
>
>
> On 22 Jun 2020 "guru.novice" wrote:
>
> Hi, all
> We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
> Later I find something in wiki
> http://wiki.lustre.org/Infiniband_Configuration_Howto and
>
> http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
> which was
> last edited on 2016.
> So do i need to change lnet configuration described in this page ?
> Or the problem has been resolved in new version (like 2.12.x) ?
> Anymore where can i find more details ?
>
> Any suggestions would be appreciated.
> Thanks?
>
>
>
> Hello guru.novice,
>
> Lustre 2.12.x has some nice LNet configuration abilities.  The old
> /etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An
> install of Lustre 2.12.x provides a sample of this file (with the lines
> commented out).  Our experience has shown that not all lines are necessary;
> edit to suit.
>
>
>
> The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt
> to automatically find active and viable LNet paths to use.  This should
> have no issue with your mlx4/5 mix environment; we have some mixed IB and
> eth that work. To explicitly use MR one may set "Multi-Rail: true" in the
> "peer" NID section of the /etc/lnet.conf file.  But that was not necessary
> for us.  We used a simple /etc/lnet.conf for MR systems:
>
> File stub: /etc/lnet.conf
>
> net:
>
>    - net type: o2ib0
>
>      local NI(s):
>
>         - interfaces:
>
>              0: ib0
>
>   - net type: o2ib777
>
>      local NI(s):
>
>         - interfaces:
>
>              0: ib0:1
>
> This allowed LNet to use any NID o2ib0 and o2ib777.
>
>
>
> Whatever is placed in the /etc/lnet.conf file is loaded into the kernel
> modules used via the Lustre starting mechanism (CentOS uses
> /usr/lib/systemd/system).  Because we are choosing _not_ to use MR on a
> different box, we explicitly defined the available routes in /etc/lnet.conf
> using the lines:
>
> route:
>
>    - net: tcp
>
>      gateway: 10.10.10.101 at o2ib11111
>
>    - net: tcp
>
>      gateway: 10.10.10.102 at o2ib1111
>
> And so on up to 10.10.10.116 at o2ib1111
>
>
>
>  In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced
> below.  (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1
> and  kernel 3.10.957.27.2.el7)
>
> File lnet.service:
>
> [unit]
>
> Description=lnet management
>
> Requires=network-online.target
>
> After=network-online.target openibd.service rdma.service opa.service
>
> ConditionsPathExists=!/proc/sys/lnet/
>
>
>
> [Service]
>
> Type=oneshot
>
> RemainAfterExit=true
>
> ExecStart=/sbin/modprobe lnet
>
> ExecStart=/usr/sbin/lnetctl lnet configure
>
> ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if
> you want MR function
>
> ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with
> router, credit and similar info
>
> ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib11111
> --non_mr  <--Omit non_rm if you want to use MR
>
> ExecStop=/usr/sbin/lustre_rmmod ptlrpc
>
> ExecStop=/usr/sbin/lnetctl lnet unconfigure
>
> ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs
>
>
>
> [Install]
>
> WantedBy=multi-user.target
>
>
>
> I hope this info can help you in the right direction.
>
>
>
> Cheers,
>
> megan
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
> Our site had issues with Multi-Rail "not socially distancing
> appropriately" from other LNet networks so in our particular case we
> disabled MR.  (An entirely different experience.) ]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200701/00f6f535/attachment.html>


More information about the lustre-discuss mailing list