[Lustre-discuss] root on lustre and timeouts
Robin Humble
robin.humble+lustre at anu.edu.au
Wed Apr 29 07:39:20 PDT 2009
we are (happily) using read-only root-on-Lustre in production with
oneSIS, but have noticed something odd...
if a root-on-Lustre client node has been up for more than 10 or 12hours
then it survives an MDS failure/failover/reboot event(*), but if the
client is newly rebooted and has been up for less than this time, then
it doesn't successfully reconnect after an MDS event and the node is
~dead.
by trial and error I've also found that if I rsync /lib64, /bin, and
/sbin from Lustre to a root ramdisk, 'echo 3 > /proc/sys/vm/drop_caches',
and symlink the rest of dirs to Lustre then the node sails through MDS
events. leaving out any one of the dirs/steps leads to a dead node. so
it looks like the Lustre kernel's recovery process is somehow tied to
userspace via apps in /bin and /sbin?
I can reproduce the weird 10-12hr behaviour at will by changing the
clock on nodes in a toy Lustre test setup. ie.
- servers and client all have the correct time
- reboot client node
- stop ntpd everywhere
- use 'date --set ...' to set all clocks to be X hours in the future
- cause a MDS event(*)
- wait for recovery to complete
- if X <= ~10 to 12 then the client will be dead
it's no big deal to put those 3 dirs into ramdisk as they're really
small (and the part-on-ramdisk model is nice and flexible too), so
we'll probably move to running in this way anyway, but I'm still
curious as to why a kernel-only system like Lustre a) cares about
userspace at all during recovery b) why it has a 10-12hr timescale :-)
changing the contents of /proc/sys/lnet/upcall into some path stat'able
without Lustre being up doesn't change anything.
BTW, OSS reboot/failover is handled fine with root on Lustre, as are
regular (non-root on Lustre clients) - this behaviour seems to be
limited to the MDS/MGS failure when all/almost-all of the OS is on Lustre.
our setup is patchless 1.6.4.3 clients, 1.6.6 servers, rhel5.2/5.3
x86_64, but the behaviour seems the same with much newer Lustre too
eg. patched b_release_1_8_0.
cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility
(*) umount mdt and mgs, lustre_rmmod, wait 10 mins, mount them again
More information about the lustre-discuss
mailing list