[lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)

Horn, Chris chris.horn at hpe.com
Thu Nov 30 08:29:46 PST 2023


Right, when you format a Lustre target, it registers itself with the MGS. Part of that registration is telling the MGS what NIDs the target can be reached at (the MGS, in turn, passes this information to the clients). If you add or delete NIDs then you need to ensure that information is updated with the MGS. This is the procedure I linked in the Ops manual.

lctl list_nids does not tell you which NIDs are registered with the MGS. It only tells you what NIDs are currently defined on the local host. There is some way to inspect the config log to see what NIDs are in there, but I can’t recall the specifics off the top of my head.

Chris Horn

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Laura Hild via lustre-discuss <lustre-discuss at lists.lustre.org>
Date: Thursday, November 30, 2023 at 8:22 AM
To: Philipp Grau <phgrau at zedat.fu-berlin.de>
Cc: Lustre User Discussion Mailing List <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)
Hi Philipp-

I don't do this a ton so I'm hazy, but do you set nids or nets when you mkfs.lustre?  So then maybe you have to tunefs those in when you add more?

-Laura


________________________________________
Od: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> v imenu Philipp Grau <phgrau at zedat.fu-berlin.de>
Poslano: sreda, 29. november 2023 06:37
Za: lustre-discuss at lists.lustre.org
Zadeva: [lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)

Hello,

some questions regarding network connection setup for ethernet based
clients.

We have a working Luste installation with two MDS servers and seven
ODS systems connected to our cluster via omnipath/ib. This part is
working fine.

Now we want to add some clients that have only a ethernet connection
to the Lustre servers (with the ethernet cards in the servers).

Our MDS and ODS servers have the following lnet setup:

net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.149.0.XXX at o2ib # IP of the local ib interface
          status: up
          interfaces:
              0: ib0
    - net type: tcp
      local NI(s):
        - nid: xxx.xxx.5.XXX at tcp # IP of the local ethernet interface
          status: up
          interfaces:
              0: eno1


Our test ethernet node:

lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: xxx.xxx.4.XXX at tcp # same subnet as above, it is a /23
          status: up
          interfaces:
              0: enp225s0f0

So far so good.

I'm able to lnetctl ping in both directions:

Ping the client:

lnetctl ping xxx.xxx.4.xxx at tcp
ping:
    - primary nid: xxx.xxx.4.xxx at tcp
      Multi-Rail: True
      peer ni:
        - nid: xxx.xxx.4.xxx at tcp

Ping the server:

lnetctl ping xxx.xxx.5.xxx at tcp
ping:
    - primary nid: xxx.xxx.5.xxx at tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.149.0.183 at o2ib
        - nid: xxx.xxx.5.xxx at tcp

But the mount fails, output from dmesg (are there other sources of
debug information?):

LustreError: 25758:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection
LustreError: 25758:0:(obd_config.c:559:class_setup()) setup scratch-MDT0000-mdc-ffff8b63003d4000 failed (-2)
LustreError: 25758:0:(obd_config.c:1835:class_config_llog_handler()) MGCxxx.xxx.5.xxx at tcp: cfg command failed: rc = -2
Lustre:    cmd=cf003 0:scratch-MDT0000-mdc  1:scratch-MDT0000_UUID  2:10.149.0.183 at o2ib
LustreError: 15c-8: MGC160.45.5.246 at tcp: The configuration from log 'scratch-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 25734:0:(obd_config.c:610:class_cleanup()) Device 3 not setup
Lustre: Unmounted scratch-client
LustreError: 25734:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount  (-2)

Does some one have some ideas or reference documentation on this topic?

Do I need some "lnetctl route" stuff?

Do I need some "lnetctl peer add ..." to make the Lustre servers and
clients known to each other?

Any hints are welcome!

Kind regards,

Philipp

--
 Philipp Grau               | Freie Universitaet Berlin
 phgrau at ZEDAT.FU-Berlin.DE  | FU-IT - Infrastruktur
 Tel: +49 (30) 838 56583    | Fabeckstr. 32
 Fax: +49 (30) 838 56721    | 14195 Berlin

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231130/92f4c43d/attachment-0001.htm>


More information about the lustre-discuss mailing list