[lustre-discuss] rebooting nodes

Ben Evans bevans at cray.com
Thu Aug 10 06:44:59 PDT 2017


Are the Infiniband drivers disappearing first?  I know that used to be an
issue.

-Ben

On 8/10/17, 8:59 AM, "lustre-discuss on behalf of Michael Di Domenico"
<lustre-discuss-bounces at lists.lustre.org on behalf of
mdidomenico4 at gmail.com> wrote:

>does anyone else have issues with issue 'reboot' while having a lustre
>mount?
>
>we're running v2.9 clients on our workstations, but when a user goes
>to reboot the machine (from the gui) the system stalls under systemd
>while i presume it's attempting to unmount the filesystem.
>
>what i see on the console is; systemd kicks in and starts unmounting
>all the nfs shares we have, works fine.  but then it gets to lustre
>and starts throwing connection errors on the console.  it's almost as
>if systemd raced itself stopping lustre, whereby lnet got yanked out
>from under the mount before the unmount actually finished.
>
>after five minutes or so, it looks like systemd threw in the towel and
>gave up trying to unmount, but the system is stuck still trying to
>execute more shutdown tasks.
>
>when we mount lustre on the workstations, i have a script that figures
>some stuff out, issues a service lnet start, and then issues a mount
>command.  this all works fine, but i'm not sure if that's why systemd
>can't figure out what to do correctly.
>
>and since this is during a shutdown phase, debugging this is
>difficult.  any thoughts?
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list