[lustre-discuss] lustre-discuss Digest, Vol 128, Issue 11

Ms. Megan Larko dobsonunit at gmail.com
Tue Nov 29 12:43:08 PST 2016


Follow-up to Subject: Lustre client mount fails: Request sent has timed out
for slow reply

Thank you for the suggestions.  I was able to work past this error.  I am
not certain of the exact solution.   I did stop and restart my CentOS 7.2
opensm service.  While that did not seem to change anything immediately,
upon my return to the office after Thanksgiving the next compute nodes were
successfully connected on the InfiniBand network fabric and the Lustre
(2.8.0) file system mounted quickly as I issued the command.

So guessing here:  I had to restart the opensm service and just be patient.

Cheers,
megan

On Fri, Nov 25, 2016 at 4:06 PM, <lustre-discuss-request at lists.lustre.org>
wrote:

> Send lustre-discuss mailing list submissions to
>         lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
>         lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
>         lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>    1. Re: Lustre client mount fails: Request sent has timed     out for
>       slow reply (Dilger, Andreas)
>    2. Re: Distributing locally.... (Dilger, Andreas)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 25 Nov 2016 20:25:54 +0000
> From: "Dilger, Andreas" <andreas.dilger at intel.com>
> To: "Ms. Megan Larko" <dobsonunit at gmail.com>
> Cc: Lustre User Discussion Mailing List
>         <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre client mount fails: Request sent
>         has timed       out for slow reply
> Message-ID: <0E44CDE8-D3E4-41C2-84A0-683B398FF846 at intel.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Possible causes in cases like this:
> - duplicate client IP addresses (used only at connect time for o2iblnd)
> - firewall rules (though unlikely to be the case for IB)
> - SELinux (this is supported in Lustre 2.7+ but can still have rules that
> prevent mounting)
>
> Sorry, I don't know anything about opensm.  Presumably you've restarted
> these clients, and
> other IB-level communications are working?
>
> Cheers, Andreas
>
> On Nov 25, 2016, at 12:05, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
> >
> > Greetings List!
> >
> > I have a very small HPC cluster running CentOS 7.2.  The lustre servers
> are running lustre kernel-3.10.0-327.3.1.el7_lustre.x86_64.   The clients
> are running kernel-3.10.0-327.3.1.el7.x86_64.
> >
> > I have two compute node clients successfully mounting the Lustre file
> system from the servers.  The next two compute clients will not mount
> lustre.  I have the lustre-client-3.8.0-3.10.0_327.3.1.el7.x86_64 and
> lustre-client-modules-2.8.0-e.10.0_327.3.1.el7.x86_64 rpm installed on
> all compute clients, including the next two.  My InfiniBand network is up
> and successfully pings the other systems.  I can cleanly "modprobe lustre"
> using /etc/modprobe.d/lustre.conf containing one line: options lnet
> networks="o2ib0(ib0)".  This information is the same on both Lustre client
> and server systems, all of which use ib0.
> >
> > On the next two compute clients I can successfully "lctl ping
> mds-ib at o2ib0" and successfully ping the oss similarly.  I try to mount
> the Lustre file system on the next two compute clients via the command
> "mount -t lustre A.B.C.D at o2ib0:/myLustre /myLustre where the A.B.C.D
> address exists and works as described above and the Lustre FS is "myLustre"
> and successfully mounts on the two earlier compute clients.
> >
> > This mount fails on both of my next two compute clients with the STDERR:
> >
> > mount.lustre: mount A.B.C.D at o2ib0:/myLustre /myLustre failed:
> Input/output error
> >
> > The compute client /var/log/messages file shows:
> > [date] [hostname] kernel: Lustre: 51814:0:(client.c:2063:ptlrpc_expire_one_request())
> @@@ Request sent has timed out for slow reply: [sent 1480097968/real
> 1480097992]  req at ffff8800aa14000 x1551992831868952/t0(0)
> o250->MCGA.B.C.D at o2ib@A.B.C.D at o2ib:26:25 lens 520/544 e 0 to 1 dl
> 1480997973 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
> >
> > The above appears 2X in a row followed by:
> > [date] [hostname] kernel: LustreError: 15c-8: MGCA.B.C.D at o2ib: The
> configuration from log 'myLustre-client' failed (-5).  This may be the
> result of communication errors between this node and the MGS, a bad
> configuration, or other errors.  See the syslog for more information.
> > [date] [hostname] kernel: Lustre: Unmounted myLustre-client
> > [date] [hostname] kernel: LustreError: 53873:0:(obd_mount.c:1426:lustre_fill_super())
> unable to mount  (-5)
> >
> > As all four compute nodes are built from a single kickstart file, I do
> not understand why two compute clients can mount the /myLustre file system
> and two cannot.    The IB fabric on the in-kernel
> opensm-3.3.10-1.el7.x86_64 looks clean with no entries in the
> /var/log/opensm-unhealthy-ports-dump.   If I go all the way back to the
> last opensm start I do see a single line in /var/log/opensm.log on the
> opensm server for the next compute client stating:
> > subn_validate_neighbor: ERR 7518: neighbor does not point back at us
> (guid: [GUID of my next compute client])
> >
> > Is this last opensm error completely stopping my Lustre mount when all
> other IP pings are completely successful?
> >
> > TIA,
> > megan
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 25 Nov 2016 20:50:03 +0000
> From: "Dilger, Andreas" <andreas.dilger at intel.com>
> To: Thomas Stibor <t.stibor at gsi.de>
> Cc: "lustre-discuss at lists.lustre.org"
>         <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Distributing locally....
> Message-ID: <731807CD-AA6E-469C-9593-C27426DF0139 at intel.com>
> Content-Type: text/plain; charset="us-ascii"
>
> On Nov 25, 2016, at 04:27, Thomas Stibor <t.stibor at gsi.de> wrote:
> >
> > Remove in debian/lustre-dev.install the line
> > -debian/tmp/usr/lib/*.so.*            usr/lib
> > and it will work.
> >
> > @@ -1,6 +1,5 @@
> > lustre/contrib/README                 usr/share/doc/lustre-dev/contrib
> > lustre/contrib/mpich-1.2.6-lustre.patch usr/share/doc/lustre-dev/contrib
> > debian/tmp/usr/include/lustre/*               usr/include/lustre
> > -debian/tmp/usr/lib/*.so.*            usr/lib
> > debian/tmp/usr/lib/*.so               usr/lib
> > debian/tmp/usr/lib/*.a                        usr/lib
>
> Thomas or Phill,
> could you please submit a patch to Gerrit with this change.
>
> > Note, also make sure to update
> > debian/changelog
> > e.g. with cmd
> >
> > export DEBFULLNAME="My Name"
> > export EMAIL="myname at mydomain.cz"
> >
> > # Extract lustre version, replace "_" by "." and remove leading letter
> "v".
> > LUSTRE_VERSION=$(echo `git describe` | sed -e "s/_/\./g" | cut -c2-)
> > LUSTRE_DEBIAN_REV='1'
> >
> > # Add entry into debian/changelog such that packages have proper version
> names.
> > dch --newversion ${LUSTRE_VERSION}-${LUSTRE_DEBIAN_REV} --distribution
> unstable --nomultimaint -t "Build from official master upstream."
> >
> > otherwise you get package version names according to top entry in
> debian/changelog
> > which does not usually match with the GIT version you are compiling.
>
> It would be nice to add this as part of the "make debs" target so that the
> build is
> done with the right version.  Bonus points if it checks the top changelog
> entry to
> see there is already an entry for the current version and doesn't add a
> new entry.
>
> Cheers, Andreas
>
> > Cheers
> > Thomas
> >
> > On Fri, Nov 25, 2016 at 10:04:06AM +0000, Phill Harvey-Smith wrote:
> >> On 02/11/2016 17:54, Dilger, Andreas wrote:
> >>> There is a "make debs" target, but I don't know how often this is
> >>> tested.  That would be the best thing to use for Ubuntu, and if it
> isn't
> >>> working then please feel free to report to the list and/or Jira.
> >>
> >> Just got back to this,
> >>
> >> make debs gets further but still seems to crash out....
> >>
> >> Steps :
> >>
> >> Get source from git.
> >> Select 2.8.0 with : git checkout 2.8.0
> >> sh ./autogen.sh
> >> ./configure --disable-server --with-o2ib=no
> >> make
> >>
> >> The make completes correctly, without errors, I have done a make install
> >> on this node in the past with this version which is up and running
> >> correctly.
> >>
> >> make debs
> >>
> >> bombs out, log below :
> >>
> >> I've uploaded the log to :
> >>
> >> http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/lustre_
> make_deb_error.txt
> >>
> >> As the list refused to accept it as it was too big :(
> >>
> >> Cheers.
> >>
> >> Phill.
> >>
> >> _______________________________________________
> >> lustre-discuss mailing list
> >> lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 128, Issue 11
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161129/fd771922/attachment-0001.htm>


More information about the lustre-discuss mailing list