[lustre-discuss] lustre mount in heterogeneous net environment

Ms. Megan Larko dobsonunit at gmail.com
Tue Feb 27 12:08:46 PST 2018


Hello List!

We have some 2.7.18 lustre servers using TCP.  Through some dual-homed
Lustre LNet routes we desire to connect some Mellanox (mlx4) InfiniBand
Lustre 2.7.0 clients.

The "lctl ping" command works from both the server co-located MGS/MDS and
from the client.
The mount of the TCP lustre server share from the IB client starts and then
shortly thereafter fails with "Input/output error    Is the MGS running?"

The Lustre MDS at approximate 20 min. intervals from client mount request
/var/log/messages reports:
Lustre: MGS: Client <string> (at A.B.C.D at o2ib) reconnecting

The IB client mount command:
mount -t lustre C.D.E.F at tcp0:/lustre /mnt/lustre

Waits about a minute then returns:
mount.lustre C.D.E.F at tcp0:/lustre at /mnt/lustre failed:  Input/output error
Is the MGS running?.

The IB client /var/log/messages file contains:
Lustre: client.c:19349:ptlrpc_expire_one_request(()) @@@ Request sent has
timed out for slow reply ...... -->MGCC.D.E.F at tcp was lost; in progress
operations using this service will fail
LustreError: 15c-8: MGCC.D.E.F at tcp: The configuration from log
'lustre-client' failed (-5)  This may be the result of communication errors
between this node and the MGS, a bad configuration, or other errors.  See
the syslog for more information.
Lustre: MGCC.D.E.F at tcp: Connection restored to MGS (at C.D.E.F at tcp)
Lustre: Unmounted lustre-client
LustreError: 22939:0:(obd_mount.c:lustre_fill_super()) Unable to mount (-5)

We have not (yet) set any non-default values on the Lustre File System.
*  Server: Lustre 2.7.18  CentOS Linux release 7.3.1611 (Core)  kernel
3.10.0-514.2.2.el7_lustre.x86_64   The server is ethernet; no IB.

*  Client: Lustre-2.7.0  RHEL 6.8  kernel 2.6.32-696.3.2.el6.x86_64    The
client uses Mellanox InfiniBand mlx4.

The mount point does exist on the client.   The firewall is not an issue;
checked.  SELinux is disabled.

NOTE: The server does server the same /lustre file system to other TCP
Lustre clients.
The client does mount other /lustre_mnt from other IB servers.

The info on
http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
describes the situation exceedingly similar to ours.   I'm not sure what
Lustre settings to check if I have not explicitly set any to be different
that the default value.

Any hints would be genuinely appreciated.
Cheers,
megan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180227/38605920/attachment.html>


More information about the lustre-discuss mailing list