[lustre-discuss] rebooting nodes

Michael Di Domenico mdidomenico4 at gmail.com
Thu Aug 10 05:59:17 PDT 2017


does anyone else have issues with issue 'reboot' while having a lustre mount?

we're running v2.9 clients on our workstations, but when a user goes
to reboot the machine (from the gui) the system stalls under systemd
while i presume it's attempting to unmount the filesystem.

what i see on the console is; systemd kicks in and starts unmounting
all the nfs shares we have, works fine.  but then it gets to lustre
and starts throwing connection errors on the console.  it's almost as
if systemd raced itself stopping lustre, whereby lnet got yanked out
from under the mount before the unmount actually finished.

after five minutes or so, it looks like systemd threw in the towel and
gave up trying to unmount, but the system is stuck still trying to
execute more shutdown tasks.

when we mount lustre on the workstations, i have a script that figures
some stuff out, issues a service lnet start, and then issues a mount
command.  this all works fine, but i'm not sure if that's why systemd
can't figure out what to do correctly.

and since this is during a shutdown phase, debugging this is
difficult.  any thoughts?


More information about the lustre-discuss mailing list