[Lustre-discuss] What do you think of this idea?

Sun May 23 08:42:49 PDT 2010

[ ... ]

>> What if I take a single workstation and attach to it via
>> iSCSI two Drobo disk arrays. Would it be possible to run both
>> the metadata and the object storage off of a single machine?
>> [ ...  for backups ... ]

> [ ... ] However, the question arises whether you need Lustre
> at all, if you are just putting it on a single server?  Nobody
> can accuse me of being anti-Lustre :-), but it definitely has
> more complexity than just using a single large filesystem
> (e.g. XFS) and NFS exporting it, or using rsync + ssh for your
> backups and not having any network exporting of the
> filesystem, if that meets your needs just as well.

Ahhh that is really underselling Lustre. Sure, the main claim to
fame is the spread-across-network angle, but there are some good
reasons to use Lustre as a replacement for something like an
ext4+NFS setup (and not necessarily in the case above, these are
general points):

 #1 'fsck' times. Lustre allows splitting a large namespace onto
     multiple filesystems that can, if possible, be checked
     independently. A lot of people underestimate the issue of
     'fsck' scalability, and using a metafilesystem is a palliative.

 #2 network protocol. Lustre as a network filesystem it has a
    more POSIX-like protocol, which is arguably an advantage on
    the NFS network protocol (even if NFSv4 has the single port
    advantage).

 #3 quality of implementation. Regrettably the Linux NFS client
    is not that good, especially for writes. The Linux NFS
    server is good, but the Lustre server is perhaps a bit
    better, for example for recovery (if one has multiple
    servers for resilience as opposed to parallelism). Sure
    things like info/error message quality is abysmal, but then
    that's pretty common (e.g. Kerberos5).

 #4 LNET. In some cases having LNET is quite useful regardless,
    as it runs transparently over several different network
    types, and allows some interesting forms of recovery.

Some of these advantages (in particular #1) are compelling
enough to suggest using Lustre as *local* filesystem, too bad
that there are potential problems with that. If it weren't for
the latter I could well imagine having a fast SMP machine using
Lustre locally for computations requiring particularly high
bandwidth and low latency, and exporting it over the network
for remote access when convenient.

There is no real alternative between RSYNC-SSH and Lustre either
for backups because:

 #1 SSH is very expensive and it is a very poor network
    protocol for data transfer, even if convenient.

 #2 One can very well anyhow use RSYNC+SSH to backup, and then
    give access to the backup over Lustre. As a rule I configure
    storage servers so that one can access the same data over
    several procotols, as in (with some poetic license):
      nfs://srv.example.com/mnt/fsys3/
      ssh://srv.example.com/mnt/fsys3/
      lustre://srv.example.com/mnt/fsys3/
      smb://srv.example.com/mnt/fsys3/
      rsync://srv.example.com/mnt/fsys3/
      https://srv.example.com/mnt/fsys3/
      ftps://srv.example.com/mnt/fsys3/
      webdavs://srv.example.com/mnt/fsys3/

> Lustre is best suited for the case where the performance/space
> requirements are larger than what a single server can provide,
> and I don't think that matches your use case very well.

Sure, parallelism is the big deal (despite the 1.x series lack
of MDS scalability), but even in small implementations I'd often
suggest a Lustre server or a resilient pair of Lustre servers
rather than an ext4+NFS one, especially if the storage space is
big and cannot be split logically (it has to be a single
namespace), if only for the 'fsck' benefits.

The biggest limitation with Lustre is that it has been based for
what I think are mostly political/commercial reasons on ext3 and
ext4, instead on JFS (or XFS). But that's not just for Lustre,
most of the Linux world made that big mistake.

But currently as (free sw) network file system go Lustre is
probably without alternative (even if GlusterFS or Ceph might be
quite interesting), even outside the parallel/cluster case.