[Lustre-discuss] diskless booting & Debian

Wed Aug 13 01:51:10 PDT 2008

Jeff/Troy,

----- "Jeff Darcy" <jeffd at sicortex.com> wrote:

> We've tried loading "canned" MDT/OST images into memory on some nodes
> and serving from there, and it does seem to work.  There are two
> downsides, though.  One is that the Linux loopback driver is a real
> performance bottleneck, ever since some bright person had the idea to
> make it less multi-threaded than it had been.  Another is that
> booting tends to involve metadata-heavy access patterns which are not exactly
> Lustre's strength - a situation made worse when you have nearly a
> thousand clients doing it at the same time and your MDS is a
> relatively small node like the others.  So far we've found that NBD serves us
> better in the boot/root filesystem role, though that means a
> read-only root which involves its own complexity.  Your mileage will almost
> certainly vary.

A good trick with Lustre to get around the metadata bottleneck is to use disk image files on Lustre (e.g. SquashFS) and mount them using the loopback driver on each compute node (or "lctl attach_device" ?). So instead of having to bother the MDS you need only seek through the file on an OST. By either striping the read-only image across all your OSTs or having a round-robin image per OST you can get pretty good scalability.

I tried this with our 700 node compute cluster but to be honest the overall booting performance was not that different to a couple of NFS servers serving a read-only root so it was not really worth the extra complexity in the end.

We do still use SquashFS on Lustre from time to time when we have a directory tree with 30,000 small files in it that needs to be read by every farm machine. It's rare but it does happen and traditionally NFS does much better with such workloads.

Daire