[lustre-discuss] Issue of mounting lustre through specified interface
Tung-Han Hsieh
thhsieh at twcp1.phys.ntu.edu.tw
Sat Jul 4 00:26:05 PDT 2020
Dear All,
We have Lustre servers (MDS, OSS) with Lustre-2.10.7 installed, with
both tcp and o2ib interfaces:
[ 193.016516] Lustre: Lustre: Build Version: 2.10.7
[ 193.486408] LNet: Added LNI 192.168.62.151 at o2ib [8/256/0/180]
[ 193.538200] LNet: Added LNI 192.168.60.151 at tcp [8/256/0/180]
[ 193.538372] LNet: Accept secure, port 988
We have several clients, all with Lustre-2.12.4. Some have both tcp
and o2ib interfaces. These clients can mount Lustre server with o2ib
interface without any problem, i.e.,
mount -t lustre -o flock 192.168.62.151 at o2ib:/chome /home
(this is OK)
However, we have another client with Lustre-2.12.4, too, which only
has tcp interface. It cannot mount server through tcp interface:
mount -t lustre -o flock 192.168.60.151 at tcp:/chome /home
(this is failed with "Input/output error, Is the MGS running ?")
Checking the dmesg message of this client, it reads:
=========================================================================
[3106477.006512] LNetError: 15970:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to 192.168.62.151 at o2ib from <?>
[3106483.142436] LustreError: 122230:0:(mgc_request.c:249:do_config_log_add()) MGC192.168.60.151 at tcp: failed processing log, type 1: rc = -5
[3106492.293968] LustreError: 122238:0:(mgc_request.c:599:do_requeue()) failed processing log: -5
[3106513.861586] LustreError: 15c-8: MGC192.168.60.151 at tcp: The configuration from log 'chome-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[3106513.862052] Lustre: Unmounted chome-client
[3106513.862281] LustreError: 122230:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-5)
=========================================================================
Surprisingly that, although I have specified the tcp interface to
mount, but Lustre itself still tries to mount with o2ib interface.
I also tested whether LNet works or not.
(Server NID: 192.168.60.151 at tcp, Client NID: 192.168.60.30 at tcp)
>From the server side:
# /opt/lustre/sbin/lctl ping 192.168.60.30
12345-0 at lo
12345-192.168.60.30 at tcp
>From the client side:
# /opt/lustre/sbin/lctl ping 192.168.60.151
12345-0 at lo
12345-192.168.62.151 at o2ib
12345-192.168.60.151 at tcp
Hence it looks fine.
The module options (/etc/modprobe.d/lustre.conf) for server and client are:
- Server:
options lnet networks="o2ib0(ib0),tcp0(eth0)"
- Client:
options lnet networks="tcp0(eth0)"
The building options for server and client are:
- Server (Lustre-2.10.7):
./configure --prefix=/opt/lustre \
--with-linux=<linux_kernel_path> \
--with-o2ib=<compat-rdma-path>
- Client (Lustre-2.12.4):
./configure --prefix=/opt/lustre \
--with-linux=<linux_kernel_path> \
--with-o2ib=no \
--disable-server
Could anyone suggest how to solve this problem ?
Thanks very much.
T.H.Hsieh
More information about the lustre-discuss
mailing list