[Lustre-discuss] CentOS 5.4 (Rocks 5.3) and Lustre 1.8.2

Ms. Megan Larko dobsonunit at gmail.com
Thu Jun 3 05:57:32 PDT 2010


Hello,

I am hoping that someone will have a more elegant answer for you but I
will share my experience.

Files in a linux /etc/fstab file are mounted early in the boot process
(to get the /, /home, /opt, swap....).   This action usually happens
before networking is started.  So if your Lustre file system mount
points are listed in /etc/fstab then the system will try to mount them
at the early, pre-networking point of startup.   Without the network,
obviously the Lustre disks will not be able to mount.

Our quick and dirty solution was to mount Lustre via a script in
/etc/rc.d/init.d that was called with an S##  after IB networking had
started (and the script checked for network connectivity first).  In
the script we then either echoed the /etc/fstab mount point lines to
the bottom of that file (I'm not sure why).   Eventually we just left
the Lustre mount points out of our /etc/fstab file allowing the
/etc/rc.d/init.d script for Lustre both start and stop the Lustre
mounts in a Sys V manner (like the other init.d scripts).

The key to stopping Lustre is to make certain that there are no active
jobs (RPCs in-flight) or Lustre/LNET will resist an umount command.
We used a sub-section of the script to check for success/exit code of
umount and if unsuccessful sleep and try to umount again.   This
latter part was really a bubble gum and paper clip approach.   We did
sometimes just have to outright kill the job or even LNET to get the
Lustre file system to unmount and not just hang indefinitely.

I honestly hope that there are more elegant solutions developed by
others out there who may wish to share.

Megan
SGI Federal



More information about the lustre-discuss mailing list