[Lustre-discuss] NFS vs Lustre

Sun Aug 30 07:12:14 PDT 2009

Well said.

This should be on the Wiki :-)

On Sat, Aug 29, 2009 at 2:15 PM, John K. Dawson<jkdawson at gmail.com> wrote:
> Lee,
>
> Thanks for posting this. I found the background and perspective very
> interesting.
>
> John
>
> John K. Dawson
> jkdawson at gmail.com
> 612-860-2388
>
> On Aug 29, 2009, at 12:56 PM, Lee Ward wrote:
>
>> You seem to be correct. Nobody ever seems to contrast NFS with these
>> super file systems solutions. That is interesting.
>>
>> It's Saturday, the family is out running around. I have time to think
>> about this question. Unfortunately, for you, I do this more for myself.
>> Which means this is going to be a stream-of-consciousness thing far more
>> than a well organized discussion. Sorry.
>>
>> I'd begin by motivating both NFS and Lustre. Why do they exist? What
>> problems do they solve.
>>
>> NFS first.
>>
>> Way back in the day, ethernet and the concept of a workstation got
>> popular. There were many tools to copy files between machines but few
>> ways to share a name space; Have the directory hierarchy and it's
>> content directly accessible to an application on a foreign machine. This
>> made file sharing awkward. The model was to copy the file or files to
>> the workstation where the work was going to be done, do the work, and
>> copy the results back to some, hopefully, well maintained central
>> machine.
>>
>> There *were* solutions to this at the time. I recall an attractive
>> alternative called RFS (I believe) from the Bell Labs folks, via some
>> place in England if I'm remembering right, it's been a looong time after
>> all. It had issues though. The nastiest issue for me was that if a
>> client went down the service side would freeze, at least partially.
>> Since this could happen willy-nilly, depending on the users wishes and
>> how well the power button on his workstation was protected, together
>> with the power cord and ethernet connection, this freezing of service
>> for any amount of time was difficult to accept. This was so even in a
>> rather small collection of machines.
>>
>> The problem with RFS (?) and it's cousins were that they were all
>> stateful. The service side depended on state that was held at the
>> client. If the client went down, the service side couldn't continue
>> without a whole lot of recovery, timeouts, etc. It was a very *annoying*
>> problem.
>>
>> In the latter half of the 1980's (am I remembering right?) SUN proposed
>> an open protocol called NFS. An implementation using this protocol could
>> do most everything RFS(?) could but it didn't suffer the service-side
>> hangs. It couldn't. It was stateless. If the client went down, the
>> server just didn't care. If the server went down, the client had the
>> opportunity to either give up on the local operation, usually with an
>> error returned, or wait. It was always up to the user and for client
>> failures the annoyance was limited to the user(s) on that client.
>>
>> SUN, also, wisely desired the protocol to be ubiquitous. They published
>> it. They wanted *everyone* to adopt it. More, they would help
>> competitors. SUN held interoperability bake-a-thons to help with this.
>>
>> It looks like they succeeded, all around :)
>>
>> Let's sum up, then. The goals for NFS were:
>>
>> 1) Share a local file system name space across the network.
>> 2) Do it in a robust, resilient way. Pesky FS issues because some user
>> kicked the cord out of his workstation was unacceptable.
>> 3) Make it ubiquitous. SUN was a workstation vendor. They sold servers
>> but almost everyone had a VAX in their back pocket where they made the
>> infrastructure investment. SUN needed the high-value machines to support
>> this protocol.
>>
>> Now Lustre.
>>
>> Lustre has a weird story and I'm not going to go into all of it. The
>> shortest, relevant, part is that while there was at least one solution
>> that DOE/NNSA felt acceptable, GPFS, it was not available on anything
>> other than an IBM platform and because DOE/NNSA had a semi-formal policy
>> of buying from different vendors at each of the three labs we were kind
>> of stuck. Other file systems, existing and imminent, at the time were
>> examined but they were all distributed file systems and we needed IO
>> *bandwidth*. We needed lots, and lots of bandwidth.
>>
>> We also needed that ubiquitous thing that SUN had as one of their goals.
>> We didn't want to pay millions of dollars for another GPFS. We felt that
>> would only be painting ourselves into a corner. Whatever we did, the
>> result *had* to be open. It also had to be attractive to smaller sites
>> as we wanted to turn loose of the ting at some point. If it was
>> attractive for smaller machines we felt we would win in the long term
>> as, eventually, the cost to further and maintain this thing was spread
>> across the community.
>>
>> As far as technical goals, I guess we just wanted GPFS, but open. More
>> though, we wanted it to survive in our platform roadmaps for at least a
>> decade. The actual technical requirements for the contract that DOE/NNSA
>> executed with HP, CFS was the sub-contractor responsible for
>> development, can be found here:
>>
>> <http://www-cs-students.stanford.edu/~trj/SGS_PathForward_SOW.pdf>
>>
>> LLNL used to host this but it's no longer there? Oh well, hopefully this
>> link will be good for a while, at least.
>>
>> I'm just going to jump to the end and sum the goals up:
>>
>> 1) It must do *everything* NFS can. We relaxed the stateless thing
>> though, see the next item for why.
>> 2) It must support full POSIX semantics; Last writer wins, POSIX locks,
>> etc.
>> 3) It must support all of the transports we are interested in.
>> 4) It must be scalable, in that we can cheaply attach storage and both
>> performance (reading *and* writing) and capacity within a single mounted
>> file system increase in direct proportion.
>> 6) We wanted it to be easy, administratively. Our goal was that it be no
>> harder than NFS to set up and maintain. We were involving too many folks
>> with PhDs in the operation of our machines at the time. Before you yell
>> FAIL, I'll say we did try. I'll also say we didn't make CFS responsible
>> for this part of the task. Don't blame them overly much, OK?
>> 7) We recognized we were asking for a stateful system, we wanted to
>> mitigate that by having some focus on resiliency. These were big
>> machines and clients died all the time.
>> 8) While not in the SOW, we structured the contract to accomplish some
>> future form of wide acceptance. We wanted it to be ubiquitous.
>>
>> That's a lot of goals! For the technical ones, the main ones are all
>> pretty much structured to ask two things of what became Lustre. First,
>> give us everything NFS functionally does but go far beyond it in
>> performance. Second, give us everything NFS functionally does but make
>> it completely equivalent to a local file system, semantically.
>>
>> There's a little more we have to consider. NFS4 is a different beast
>> than NFS2 or NFS3. NFS{2,3} had some serious issues that becaome more
>> prominent as time went by. First, security; It had none. Folks had
>> bandaged on some different things to try to cure this but they weren't
>> standard across platforms. Second, it couldn't do the full POSIX
>> required semantics. That was attacked with the NFS lock protocols but it
>> was such an after-thought it will always remain problematic. Third, new
>> authorization possibilities introduced by Microsoft and then POSIX,
>> called ACLs, had no way of being accomplished.
>>
>> NFS4 addresses those by:
>>
>> 1) Introducing state. Can do full POSIX now without the lock servers.
>> Lots of resiliency mechanisms introduced to offset the downside of this,
>> too.
>> 2) Formalizing and offerring standardized authentication headers.
>> 3) Introducing ACLs that map to equivalents in POSIX and Microsoft.
>>
>> Strengths and Weaknesses of the Two
>> -----------------------------------
>>
>> NFS4 does most everything Lustre can with one very important exception,
>> IO bandwidth.
>>
>> Both seem able to deliver metadata performance at roughly the same
>> speeds. File create, delete, and stat rates are about the same. NetApp
>> seems to have a partial enhancement. They bought the Spinnaker goodies
>> some time back and have deployed that technology, and redirection
>> too(?), within their servers. The good about that is two users in
>> different directories *could* leverage two servers, independently, and,
>> so, scale metadata performance. It's not guaranteed but at least there
>> is the possibility. If the two users are in the same directory, it's not
>> much different, though, I'm thinking. Someone correct me if I'm wrong?
>>
>> Both can offer full POSIX now. It's nasty in both cases but, yes, in
>> theory you can export mail directory hierarchies with locking.
>>
>> The NFS client and server are far easier to set up and maintain. The
>> tools to debug issues are advanced. While the Lustre folks have done
>> much to improve this area, NFS is just leaps and bounds ahead. It's
>> easier to deal with NFS than Lustre. Just far, far easier, still.
>>
>> NFS is just built in to everything. My TV has it, for hecks sake. Lustre
>> is, seemingly, always an add-on. It's also a moving target. We're
>> constantly futzing with it, upgrading, and patching. Lustre might be
>> compilable most everywhere we care about but building it isn't trivial.
>> The supplied modules are great but, still, moving targets in that we
>> wait for SUN to catch up to the vendor supplied changes that affect
>> Lustre. Given Lustre's size and interaction with other components in the
>> OS, that happens far more frequently than desired. NFS just plain wins
>> the ubiquity argument at present.
>>
>> NFS IO performance does *not* scale. It's still an in-band protocol. The
>> data is carried in the same message as the request and is, practically,
>> limited in size. Reads are more scalable in writes, a popular
>> file-segment can be satisfied from the cache on reads but develops
>> issues at some point. For writes, NFS3 and NFS4 help in that they
>> directly support write-behind so that a client doesn't have to wait for
>> data to go to disk, but it's just not enough. If one streams data
>> to/from the store, it can be larger than the cache. A client that might
>> read a file already made "hot" but at a very different rate just loses.
>> A client, writing, is always looking for free memory to buffer content.
>> Again, too many of these, simultaneously, and performance descends to
>> the native speed of the attached back-end store and that store can only
>> get so big.
>>
>> Lustre IO performance *does* scale. It uses a 3rd-party transfer.
>> Requests are made to the metadata server and IO moves directly between
>> the affected storage component(s) and the client. The more storage
>> components, the less possibility of contention between clients and the
>> more data can be accepted/supplied per unit time.
>>
>> NFS4 has a proposed extension, called pNFS, to address this problem. It
>> just introduces the 3rd-party data transfers that Lustre enjoys. If and
>> when that is a standard, and is well supported by clients and vendors,
>> the really big technical difference will virtually disappear. It's been
>> a long time coming, though. It's still not there. Will it ever be,
>> really?
>>
>> The answer to the NFS vs. Lustre question comes down to the workload for
>> a given application then, since they do have overlap in their solution
>> space. If I were asked to look at a platform and recommend a solution I
>> would worry about IO bandwidth requirements. If the platform in question
>> were either read-mostly and, practically, never needed sustained read or
>> write bandwidth, NFS would be an easy choice. I'd even think hard about
>> NFS if the platform created many files but all were very small; Today's
>> filers have very respectable IOPS rates. If it came down to IO
>> bandwidth, I'm still on the parallel file system bandwagon. NFS just
>> can't deal with that at present and I do still have the folks, in house,
>> to manage the administrative burden.
>>
>> Done. That was useful for me. I think five years ago I might have opted
>> for Lustre in the "create many small files" case, where I would consider
>> NFS today, so re-examining the motivations, relative strengths, and
>> weaknesses of both was useful. As I said, I did this more as a
>> self-exercise than anything else but I hope you can find something
>> useful here, too. The family is back from their errands, too :) Best
>> wishes and good luck.
>>
>>                --Lee
>>
>>
>> On Wed, 2009-08-26 at 04:11 -0600, Tharindu Rukshan Bamunuarachchi
>> wrote:
>>>
>>> hi All,
>>>
>>>
>>>
>>> I need to prepare small report on “NFS vs. Lustre” ?
>>>
>>>
>>>
>>> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) …
>>>
>>>
>>>
>>> Can you guys please provide few tips … URLs … etc.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> cheers,
>>>
>>> __
>>>
>>> tharindu
>>>
>>>
>>>
>>>
>>>
>>> *******************************************************************************************************************************************************************
>>>
>>> "The information contained in this email including in any attachment
>>> is confidential and is meant to be read only by the person to whom it
>>> is addressed. If you are not the intended recipient(s), you are
>>> prohibited from printing, forwarding, saving or copying this email. If
>>> you have received this e-mail in error, please immediately notify the
>>> sender and delete this e-mail and its attachments from your computer."
>>>
>>>
>>> *******************************************************************************************************************************************************************
>>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>