[lustre-discuss] lustre mount in heterogeneous net environment-update

Ms. Megan Larko dobsonunit at gmail.com
Wed Feb 28 13:36:29 PST 2018


Greetings List!

We have been continuing to dissect our LNet environment between our
lustre-2.7.0 clients and the lustre-2.7.18 servers.  We have moved from the
client node to the LNet server which bridges the InfiniBand (IB) and
ethernet networks.   As a test, we attempted to mount the ethernet Lustre
storage from the LNet hopefully taking the IB out of the equation to limit
the scope of our debugging.

On the LNet router the attempted mount of Lustre storage fails.   The LNet
command line error on the test LNet client is exactly the same as the
original client result:
mount A.B.C.D at tcp0:/lustre at /mnt/lustre failed: Input/output error  Is
the MGS running?

On the lustre servers, both the MGS/MDS and OSS we can see the error via
dmesg:
LNet: There was an unexpected network error while writing to C.D.E.F:  -110

and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS:
Lustre: MGS: Client <id string> (at C.D.E.F at tcp) reconnecting

The "lctl pings" in various directions are still successful.

So, forget the end lustre client, we are not yet getting from MGS/MDS
sucessfully to the LNet router.
We have been looking at the contents of /sys/module/lustre.conf and we are
not seeing any differences in set values between the LNet router we are
using as a test Lustre client and the Lustre MGS/MDS server.

As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both
"appliance" style Lustre storage systems and clients tied to specific
versions of the linux kernel (for reasons other than Lustre).

Is there a key parameter which I could still be overlooking?

Cheers,
megan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180228/8171585e/attachment.html>


More information about the lustre-discuss mailing list