[lustre-discuss] lustre mount in heterogeneous net environment
andreas.dilger at intel.com
Tue Feb 27 13:55:56 PST 2018
On Feb 27, 2018, at 13:08, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
> Hello List!
> We have some 2.7.18 lustre servers using TCP. Through some dual-homed Lustre LNet routes we desire to connect some Mellanox (mlx4) InfiniBand Lustre 2.7.0 clients.
Is there any reason to be running 2.7.0 clients? Those are missing a huge number of fixes compared to newer clients. Better to run matching 2.7.18 clients, or 2.10.3.
> The "lctl ping" command works from both the server co-located MGS/MDS and from the client.
> The mount of the TCP lustre server share from the IB client starts and then shortly thereafter fails with "Input/output error Is the MGS running?"
> The Lustre MDS at approximate 20 min. intervals from client mount request /var/log/messages reports:
> Lustre: MGS: Client <string> (at A.B.C.D at o2ib) reconnecting
> The IB client mount command:
> mount -t lustre C.D.E.F at tcp0:/lustre /mnt/lustre
> Waits about a minute then returns:
> mount.lustre C.D.E.F at tcp0:/lustre at /mnt/lustre failed: Input/output error
> Is the MGS running?.
> The IB client /var/log/messages file contains:
> Lustre: client.c:19349:ptlrpc_expire_one_request(()) @@@ Request sent has timed out for slow reply ...... -->MGCC.D.E.F at tcp was lost; in progress operations using this service will fail
> LustreError: 15c-8: MGCC.D.E.F at tcp: The configuration from log 'lustre-client' failed (-5) This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
> Lustre: MGCC.D.E.F at tcp: Connection restored to MGS (at C.D.E.F at tcp)
> Lustre: Unmounted lustre-client
> LustreError: 22939:0:(obd_mount.c:lustre_fill_super()) Unable to mount (-5)
> We have not (yet) set any non-default values on the Lustre File System.
> * Server: Lustre 2.7.18 CentOS Linux release 7.3.1611 (Core) kernel 3.10.0-514.2.2.el7_lustre.x86_64 The server is ethernet; no IB.
> * Client: Lustre-2.7.0 RHEL 6.8 kernel 2.6.32-696.3.2.el6.x86_64 The client uses Mellanox InfiniBand mlx4.
> The mount point does exist on the client. The firewall is not an issue; checked. SELinux is disabled.
> NOTE: The server does server the same /lustre file system to other TCP Lustre clients.
> The client does mount other /lustre_mnt from other IB servers.
> The info on http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes describes the situation exceedingly similar to ours. I'm not sure what Lustre settings to check if I have not explicitly set any to be different that the default value.
> Any hints would be genuinely appreciated.
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
Lustre Principal Architect
More information about the lustre-discuss