[Lustre-discuss] root on lustre and timeouts

Wed Apr 29 08:42:44 PDT 2009

On Wed, Apr 29, 2009 at 10:39:20AM -0400, Robin Humble wrote:
> we are (happily) using read-only root-on-Lustre in production with
> oneSIS, but have noticed something odd...
> 
> if a root-on-Lustre client node has been up for more than 10 or 12hours
> then it survives an MDS failure/failover/reboot event(*), but if the
> client is newly rebooted and has been up for less than this time, then
> it doesn't successfully reconnect after an MDS event and the node is
> ~dead.
> 
> by trial and error I've also found that if I rsync /lib64, /bin, and
> /sbin from Lustre to a root ramdisk, 'echo 3 > /proc/sys/vm/drop_caches',
> and symlink the rest of dirs to Lustre then the node sails through MDS
> events. leaving out any one of the dirs/steps leads to a dead node. so
> it looks like the Lustre kernel's recovery process is somehow tied to
> userspace via apps in /bin and /sbin?

Now that's interesting.. What distro are you using? I have been toying
with the idea of modifiying the Debian initramfs-tools boot ramdisk to
include bushbox and dropbear-ssh in order to debug these kind of
root-network-filesystem bugs. In my case, I'm running AFS as the root
filesystem, and I have the 'afsd' in the ramdisk that gets started at
boot. I'm wondering if the Lustre binaries that are necessary could be
placed in the initrd as well.

It would be nice if various distros could work 'out of the box' with
readonly network filesystems.