[lustre-discuss] rebooting nodes

Christopher Johnston chjohnst at gmail.com
Thu Aug 10 06:51:46 PDT 2017


On my systems that use standard ethernet (im in the cloud), 2.9 reboots I
have no issues I can see.  I did have issues with the lnet driver not being
able to grab the port on boot-up so I backported the lnet systemd unit file
from 2.10 to get around that.

On Thu, Aug 10, 2017 at 9:44 AM, Ben Evans <bevans at cray.com> wrote:

> Are the Infiniband drivers disappearing first?  I know that used to be an
> issue.
>
> -Ben
>
> On 8/10/17, 8:59 AM, "lustre-discuss on behalf of Michael Di Domenico"
> <lustre-discuss-bounces at lists.lustre.org on behalf of
> mdidomenico4 at gmail.com> wrote:
>
> >does anyone else have issues with issue 'reboot' while having a lustre
> >mount?
> >
> >we're running v2.9 clients on our workstations, but when a user goes
> >to reboot the machine (from the gui) the system stalls under systemd
> >while i presume it's attempting to unmount the filesystem.
> >
> >what i see on the console is; systemd kicks in and starts unmounting
> >all the nfs shares we have, works fine.  but then it gets to lustre
> >and starts throwing connection errors on the console.  it's almost as
> >if systemd raced itself stopping lustre, whereby lnet got yanked out
> >from under the mount before the unmount actually finished.
> >
> >after five minutes or so, it looks like systemd threw in the towel and
> >gave up trying to unmount, but the system is stuck still trying to
> >execute more shutdown tasks.
> >
> >when we mount lustre on the workstations, i have a script that figures
> >some stuff out, issues a service lnet start, and then issues a mount
> >command.  this all works fine, but i'm not sure if that's why systemd
> >can't figure out what to do correctly.
> >
> >and since this is during a shutdown phase, debugging this is
> >difficult.  any thoughts?
> >_______________________________________________
> >lustre-discuss mailing list
> >lustre-discuss at lists.lustre.org
> >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170810/1bbc19a9/attachment.htm>


More information about the lustre-discuss mailing list