[lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)
Laura Hild
lsh at jlab.org
Thu Nov 30 06:21:23 PST 2023
Hi Philipp-
I don't do this a ton so I'm hazy, but do you set nids or nets when you mkfs.lustre? So then maybe you have to tunefs those in when you add more?
-Laura
________________________________________
Od: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> v imenu Philipp Grau <phgrau at zedat.fu-berlin.de>
Poslano: sreda, 29. november 2023 06:37
Za: lustre-discuss at lists.lustre.org
Zadeva: [lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?)
Hello,
some questions regarding network connection setup for ethernet based
clients.
We have a working Luste installation with two MDS servers and seven
ODS systems connected to our cluster via omnipath/ib. This part is
working fine.
Now we want to add some clients that have only a ethernet connection
to the Lustre servers (with the ethernet cards in the servers).
Our MDS and ODS servers have the following lnet setup:
net:
- net type: lo
local NI(s):
- nid: 0 at lo
status: up
- net type: o2ib
local NI(s):
- nid: 10.149.0.XXX at o2ib # IP of the local ib interface
status: up
interfaces:
0: ib0
- net type: tcp
local NI(s):
- nid: xxx.xxx.5.XXX at tcp # IP of the local ethernet interface
status: up
interfaces:
0: eno1
Our test ethernet node:
lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0 at lo
status: up
- net type: tcp
local NI(s):
- nid: xxx.xxx.4.XXX at tcp # same subnet as above, it is a /23
status: up
interfaces:
0: enp225s0f0
So far so good.
I'm able to lnetctl ping in both directions:
Ping the client:
lnetctl ping xxx.xxx.4.xxx at tcp
ping:
- primary nid: xxx.xxx.4.xxx at tcp
Multi-Rail: True
peer ni:
- nid: xxx.xxx.4.xxx at tcp
Ping the server:
lnetctl ping xxx.xxx.5.xxx at tcp
ping:
- primary nid: xxx.xxx.5.xxx at tcp
Multi-Rail: True
peer ni:
- nid: 10.149.0.183 at o2ib
- nid: xxx.xxx.5.xxx at tcp
But the mount fails, output from dmesg (are there other sources of
debug information?):
LustreError: 25758:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection
LustreError: 25758:0:(obd_config.c:559:class_setup()) setup scratch-MDT0000-mdc-ffff8b63003d4000 failed (-2)
LustreError: 25758:0:(obd_config.c:1835:class_config_llog_handler()) MGCxxx.xxx.5.xxx at tcp: cfg command failed: rc = -2
Lustre: cmd=cf003 0:scratch-MDT0000-mdc 1:scratch-MDT0000_UUID 2:10.149.0.183 at o2ib
LustreError: 15c-8: MGC160.45.5.246 at tcp: The configuration from log 'scratch-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 25734:0:(obd_config.c:610:class_cleanup()) Device 3 not setup
Lustre: Unmounted scratch-client
LustreError: 25734:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-2)
Does some one have some ideas or reference documentation on this topic?
Do I need some "lnetctl route" stuff?
Do I need some "lnetctl peer add ..." to make the Lustre servers and
clients known to each other?
Any hints are welcome!
Kind regards,
Philipp
--
Philipp Grau | Freie Universitaet Berlin
phgrau at ZEDAT.FU-Berlin.DE | FU-IT - Infrastruktur
Tel: +49 (30) 838 56583 | Fabeckstr. 32
Fax: +49 (30) 838 56721 | 14195 Berlin
More information about the lustre-discuss
mailing list