[lustre-discuss] lustre mount in heterogeneous net environment

Dilger, Andreas andreas.dilger at intel.com
Tue Feb 27 13:55:56 PST 2018


On Feb 27, 2018, at 13:08, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
> 
> Hello List!
> 
> We have some 2.7.18 lustre servers using TCP.  Through some dual-homed Lustre LNet routes we desire to connect some Mellanox (mlx4) InfiniBand Lustre 2.7.0 clients.  

Is there any reason to be running 2.7.0 clients?  Those are missing a huge number of fixes compared to newer clients.  Better to run matching 2.7.18 clients, or 2.10.3.

Cheers, Andreas

> The "lctl ping" command works from both the server co-located MGS/MDS and from the client.
> The mount of the TCP lustre server share from the IB client starts and then shortly thereafter fails with "Input/output error    Is the MGS running?"
> 
> The Lustre MDS at approximate 20 min. intervals from client mount request /var/log/messages reports:
> Lustre: MGS: Client <string> (at A.B.C.D at o2ib) reconnecting 
> 
> The IB client mount command:
> mount -t lustre C.D.E.F at tcp0:/lustre /mnt/lustre
> 
> Waits about a minute then returns:
> mount.lustre C.D.E.F at tcp0:/lustre at /mnt/lustre failed:  Input/output error
> Is the MGS running?.
> 
> The IB client /var/log/messages file contains:
> Lustre: client.c:19349:ptlrpc_expire_one_request(()) @@@ Request sent has timed out for slow reply ...... -->MGCC.D.E.F at tcp was lost; in progress operations using this service will fail
> LustreError: 15c-8: MGCC.D.E.F at tcp: The configuration from log 'lustre-client' failed (-5)  This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors.  See the syslog for more information.
> Lustre: MGCC.D.E.F at tcp: Connection restored to MGS (at C.D.E.F at tcp)
> Lustre: Unmounted lustre-client
> LustreError: 22939:0:(obd_mount.c:lustre_fill_super()) Unable to mount (-5)
> 
> We have not (yet) set any non-default values on the Lustre File System.
> *  Server: Lustre 2.7.18  CentOS Linux release 7.3.1611 (Core)  kernel 3.10.0-514.2.2.el7_lustre.x86_64   The server is ethernet; no IB.
> 
> *  Client: Lustre-2.7.0  RHEL 6.8  kernel 2.6.32-696.3.2.el6.x86_64    The client uses Mellanox InfiniBand mlx4.
> 
> The mount point does exist on the client.   The firewall is not an issue; checked.  SELinux is disabled.
> 
> NOTE: The server does server the same /lustre file system to other TCP Lustre clients.
> The client does mount other /lustre_mnt from other IB servers.
> 
> The info on http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes describes the situation exceedingly similar to ours.   I'm not sure what Lustre settings to check if I have not explicitly set any to be different that the default value.
> 
> Any hints would be genuinely appreciated.
> Cheers,
> megan
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









More information about the lustre-discuss mailing list