[Lustre-discuss] NFS vs Lustre

Wed Sep 16 22:48:22 PDT 2009

Greetings all,

I'd like to throw in my 2c as well. I'm not a Lustre dev, just a
sysadmin who manages a small (<10TB) Lustre data store.

For some background, we're using it in a web hosting environment for a
particularly large set of websites. We're also considering it for use
as a storage backend to a cluster of VPS servers. Our "default" choice
for new clusters is usually NFS, for many of the reasons mentioned
already- pretty good read performance, makes good use of client- and
server-side caching with no extra work, and above all it's *extremely*
simple to maintain. You can install 2 machines with completely stock
Linux distros, and the odds are both of them will support being an NFS
server *and* client, and will talk to each other with only minimal
effort.

Our problems with NFS: Occasionally we need better locking support
than NFS delivers. Often capacity scalability is a concern (if you
planned for it, you can grow the NFS-exported volume to some extent).
Scaling out to many clients (frontend web servers in our case,
usually) is sometimes a problem, although realistically we just don't
need that many frontends very often.

The downside to Lustre is the complexity. Initial setup is much
simpler than, say, Red Hat GFS or OCFS2, but still *vastly* more
complicated than NFS, due in large part to the ubiquity of NFS. If NFS
breaks (and it rarely does for us), the fix is usually pretty simple.
If Lustre breaks... well, let's just say I don't like being the guy
on-call. It could be worse, but it's no picnic. We've had a lot more
downtime with our *redundant* Lustre cluster than we ever did with the
standalone NFS servers it replaced.

Documentation-wise, a lot of NFS documentation is extremely dated, and
what used to be good advice often isn't anymore. My personal opinion
is that the Red Hat GFS documentation is an utter disaster. It looks
great from 50,000 feet but is nigh-impossible to implement without
much head-bashing. You may have found that really nice article in Red
Hat Magazine about NFS vs GFS scalability. Looks cool, doesn't it? We
tried that and gave up a week later when we just couldn't make it
stable- yeah we could make it work, but it'd be a *constant* headache.
Lustre, on the other hand, has pretty good documentation. The admin
guide is beefy and detailed, and has a lot of good info. Some of it
feels dated (1.6 vs 1.8), but all in all I'm happy with it.

Redundancy is a problem- you can sorta do HA-NFS, but it's not
particularly pretty, and it's not conveniently active-active. Lustre
has some redundancy abilities, although none of them are what I'd call
"native". To me, native failover redundancy would mean Lustre handles
the data migration/synchronization and the actual failover. Lustre
supports multiple targets for the same data, and will try them both if
it's not working... but it's up to *you* to make sure the data is
actually *available* in both places. We use DRBD for this, and
heartbeat to handle it. It mostly works, but I'm not really happy with
it. It's no worse than what NFS offers, and sometimes better.

You can easily do a LOT of disk space on one server if needed. I've
seen a 25TB array on one server (Dell MD1000's + Windows!), and
*heard* of as much as 67TB on one server (not NFS though). I really
don't know how well NFS handles arrays that size, but it should at
least function. Of course, with Lustre, you can still do that much on
one server, *plus* more servers with that much too.

There's also staffing to consider. Being so much simpler, NFS wins
because you don't need as highly-trained staff to deal with it. NFS
probably costs less from a personnel standpoint- Lustre admins are
rarer, and therefore probably command higher salaries, it's not
obvious that you would need fewer of them. At some point a manager
will have to decide if the technological benefits of Lustre outweigh
the extra staffing costs to maintain it (if there actually are any
such costs).

All in all, neither is really ideal, and they have different
strengths. If you need to be 24/7 and not a lot of your staff is going
to have time to become proficient with a complicated storage subsystem
like Lustre, you're probably better off with NFS. If you really need
better scalability or POSIX-ness, and can stand the administrative
overhead, Lustre works.

I guess the proof is in the pudding- we're not planning on migrating
en-masse from NFS to Lustre. We're sticking with NFS as our default
choice, at least for the time being.

Happy sysadmin-ing,
Jake

On Wed, Aug 26, 2009 at 3:11 AM, Tharindu Rukshan Bamunuarachchi
<tharindub at millenniumit.com> wrote:
>
> hi All,
>
>
>
> I need to prepare small report on “NFS vs. Lustre” ?
>
>
>
> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) …
>
>
>
> Can you guys please provide few tips … URLs … etc.
>
>
>
>
>
>
>
>
>
> cheers,
>
> __
>
> tharindu
>
>
>
> *******************************************************************************************************************************************************************
>
> "The information contained in this email including in any attachment is confidential and is meant to be read only by the person to whom it is addressed. If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email. If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer."
>
> *******************************************************************************************************************************************************************
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>