[lustre-discuss] lustre mount in heterogeneous net environment-update

Jeff Johnson jeff.johnson at aeoncomputing.com
Wed Feb 28 14:23:25 PST 2018


Greetings Megan,

One scenario that could cause this is if your appliance-style Lustre MDS is
a high-availability server pair and your mount command is not declaring
both NIDs in the mount command *and* the MGS and MDT resources happen to be
presently residing on the MDS server you are not declaring in your mount
command.

If it is high-availability and the IPs of those servers is A.B.C.D and
A.B.C.E then make sure your command command looks something like:

mount -t lustre A.B.C.D at tcp:A.B.C.E at tcp:/somefsname /localmountpoint

That way the client will be looking for the MGS in all of the places it
*could* be located.

Just one possibility of what may be the cause. Certainly easier and less
painful than a lower level version compatibility issue.

—Jeff

On Wed, Feb 28, 2018 at 13:36 Ms. Megan Larko <dobsonunit at gmail.com> wrote:

> Greetings List!
>
> We have been continuing to dissect our LNet environment between our
> lustre-2.7.0 clients and the lustre-2.7.18 servers.  We have moved from the
> client node to the LNet server which bridges the InfiniBand (IB) and
> ethernet networks.   As a test, we attempted to mount the ethernet Lustre
> storage from the LNet hopefully taking the IB out of the equation to limit
> the scope of our debugging.
>
> On the LNet router the attempted mount of Lustre storage fails.   The LNet
> command line error on the test LNet client is exactly the same as the
> original client result:
> mount A.B.C.D at tcp0:/lustre at /mnt/lustre failed: Input/output error  Is
> the MGS running?
>
> On the lustre servers, both the MGS/MDS and OSS we can see the error via
> dmesg:
> LNet: There was an unexpected network error while writing to C.D.E.F:  -110
>
> and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS:
> Lustre: MGS: Client <id string> (at C.D.E.F at tcp) reconnecting
>
> The "lctl pings" in various directions are still successful.
>
> So, forget the end lustre client, we are not yet getting from MGS/MDS
> sucessfully to the LNet router.
> We have been looking at the contents of /sys/module/lustre.conf and we are
> not seeing any differences in set values between the LNet router we are
> using as a test Lustre client and the Lustre MGS/MDS server.
>
> As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both
> "appliance" style Lustre storage systems and clients tied to specific
> versions of the linux kernel (for reasons other than Lustre).
>
> Is there a key parameter which I could still be overlooking?
>
> Cheers,
> megan
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180228/a5519160/attachment.html>


More information about the lustre-discuss mailing list