[Lustre-discuss] speedy server shutdown

Robin Humble rjh+lustre at cita.utoronto.ca
Sun Feb 8 20:28:49 PST 2009


Hi,

when shutting down our OSS's and then MDS's we often wait 330s for each
set of umount's to finish eg.
  Feb  2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 330 secs...
  Feb  2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 325 secs...
  ...
is there a way to speed this up?

we're interested in the (perhaps unusual) case where all clients are gone
because the power has failed, and the Lustre servers are running on UPS
and need to be shut down ASAP.

the tangible reward for a quick shutdown is that we can buy a lower
capacity (cheaper) UPS if we can reliably and cleanly shutdown all the
Lustre servers in <10mins, and preferably <3 minutes. if we're tweaking
timeouts to do this then hopefully we can tweak them just before the
shutdown and avoid running short timeouts in normal operation.

I'm probably missing something obvious, but I have looked through a
bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the
Operations Manual and I can't actually see where the default 330s comes
from... ???
it seems to be quite repeatable for both OSS's and MDS's.

we're using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on
clients with x86_64 RHEL 5.2 everywhere.
thanks for any help!

cheers,
robin



More information about the lustre-discuss mailing list