[lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)

Philipp Grau phgrau at zedat.fu-berlin.de
Wed Nov 29 03:37:31 PST 2023


Hello,

some questions regarding network connection setup for ethernet based
clients.

We have a working Luste installation with two MDS servers and seven
ODS systems connected to our cluster via omnipath/ib. This part is
working fine.

Now we want to add some clients that have only a ethernet connection
to the Lustre servers (with the ethernet cards in the servers).

Our MDS and ODS servers have the following lnet setup:

net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.149.0.XXX at o2ib # IP of the local ib interface
          status: up
          interfaces:
              0: ib0
    - net type: tcp
      local NI(s):
        - nid: xxx.xxx.5.XXX at tcp # IP of the local ethernet interface
          status: up
          interfaces:
              0: eno1


Our test ethernet node:

lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: xxx.xxx.4.XXX at tcp # same subnet as above, it is a /23
          status: up
          interfaces:
              0: enp225s0f0

So far so good. 

I'm able to lnetctl ping in both directions:

Ping the client:

lnetctl ping xxx.xxx.4.xxx at tcp
ping:
    - primary nid: xxx.xxx.4.xxx at tcp
      Multi-Rail: True
      peer ni:
        - nid: xxx.xxx.4.xxx at tcp

Ping the server:

lnetctl ping xxx.xxx.5.xxx at tcp
ping:
    - primary nid: xxx.xxx.5.xxx at tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.149.0.183 at o2ib
        - nid: xxx.xxx.5.xxx at tcp

But the mount fails, output from dmesg (are there other sources of
debug information?):

LustreError: 25758:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection
LustreError: 25758:0:(obd_config.c:559:class_setup()) setup scratch-MDT0000-mdc-ffff8b63003d4000 failed (-2)
LustreError: 25758:0:(obd_config.c:1835:class_config_llog_handler()) MGCxxx.xxx.5.xxx at tcp: cfg command failed: rc = -2
Lustre:    cmd=cf003 0:scratch-MDT0000-mdc  1:scratch-MDT0000_UUID  2:10.149.0.183 at o2ib
LustreError: 15c-8: MGC160.45.5.246 at tcp: The configuration from log 'scratch-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 25734:0:(obd_config.c:610:class_cleanup()) Device 3 not setup
Lustre: Unmounted scratch-client
LustreError: 25734:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount  (-2)

Does some one have some ideas or reference documentation on this topic?

Do I need some "lnetctl route" stuff? 

Do I need some "lnetctl peer add ..." to make the Lustre servers and
clients known to each other?

Any hints are welcome!

Kind regards,

Philipp

-- 
 Philipp Grau               | Freie Universitaet Berlin   
 phgrau at ZEDAT.FU-Berlin.DE  | FU-IT - Infrastruktur
 Tel: +49 (30) 838 56583    | Fabeckstr. 32   
 Fax: +49 (30) 838 56721    | 14195 Berlin   


More information about the lustre-discuss mailing list