[lustre-discuss] status: down for network interface
Ulrich Sibiller
ulrich.sibiller at eviden.com
Wed Jun 26 07:43:00 PDT 2024
Hello,
on one of our MDS (MDS1) with Lustre 2.15.4 we see one NI in status "down":
[ mds1 ]# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0 at lo
status: up
- net type: o2ib
local NI(s):
- nid: AA.BB.CC.34 at o2ib
status: down <-----------------
interfaces:
0: ib0
- nid: AA.BB.CC.35 at o2ib
status: up
interfaces:
0: ib1
- net type: tcp
local NI(s):
- nid: DD.EE.FF.42 at tcp
status: up
interfaces:
0: bond0
However, we are not really sure what this means, as the interface seems to be ok on the InfiniBand and the kernel side (see output at the end of the mail). Running lnetctl net show -v or lnetctl export multiple times with some pauses in between shows the send and recv counters increasing (for both ib0 and ib1):
- net type: o2ib
local NI(s):
- nid: AA.BB.CC.34 at o2ib
status: down
interfaces:
0: ib0
statistics:
send_count: 286859369
recv_count: 291921704
drop_count: 1969
<20s pause>
- net type: o2ib
local NI(s):
- nid: AA.BB.CC.34 at o2ib
status: down
interfaces:
0: ib0
statistics:
send_count: 286861252
recv_count: 291923587
drop_count: 1969
So the interface seems to be in use!
All this leads to the following questions:
- What does "down" mean here? What are the consequences?
- What could be the reason?
- What can we do to examine this further?
- How can we change the interface to status up?
I can provide further information if required.
Here are the InfiniBand and kernel stats of the interface on MDS1:
[ mds1 ]# ibdev2netdev
mlx5_0 port 1 ==> ib0 (Up)
...
[ mds1 ]# ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: <censored>
base lid: 0x21
sm lid: 0x2
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 200 Gb/sec (4X HDR)
link_layer: InfiniBand
...
[ mds1 ]# ibportstate -L 33 1
CA/RT PortInfo:
# Port info: Lid 33 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................33
SMLid:...........................2
LMC:.............................0
LinkWidthSupported:..............1X or 4X or 2X
LinkWidthEnabled:................1X or 4X or 2X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps or 25.78125 Gbps or 53.125 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps or 25.78125 Gbps or 53.125 Gbps
LinkSpeedExtActive:..............53.125 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
# MLNX ext Port info: Lid 33 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x00
LinkSpeedEnabled:................0x00
LinkSpeedActive:.................0x00
Same for the IPoIB interface:
[ mds1 ]# ip a s ib0
5: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband <censored> brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet AA.BB.CC.34/21 brd AA.BB.CC.255 scope global noprefixroute ib0
valid_lft forever preferred_lft forever
MfG/Kind regards,
Ulrich Sibiller
--
Dipl.-Inf. Ulrich Sibiller
Senior IT Consultant
eviden.com
an atos business
science+computing ag
Management Board: Dr. Martin Matzke (Chairman), Sabine Hohenstein, Matthias Schempp; Chairman of the Supervisory Board: Emmanuel Le Roux; Registered office: Tübingen; Commercial register of the local court of Stuttgart, HRB 382196
More information about the lustre-discuss
mailing list