[Lustre-discuss] Swap over lustre

Andreas Dilger adilger at whamcloud.com
Thu Aug 18 10:20:54 PDT 2011


On 2011-08-18, at 12:36 AM, "Temple  Jason" <jtemple at cscs.ch> wrote:
> I experimented with swap on lustre in as many ways as possible (without touching the code), and had the shortest path possible to no avail.  The code is not able to handle it at all, and the system always hung.

Jason, did you try the lloop device?  That was written for swap to use, to avoid the VFS, filesystem, and locking layers. It never made it to production quality, since no customer was interested to complete it, bit it is definitely the best starting point.

> Without serious code rewrites, this isn't going to work for you.

That's a difficult assessment to make.  A bunch of effort went into removing allocations in the IO path at one time, but it was never a priority to keep lloop working, so things may have regressed over time.

IMHO it probably isn't a huge effort to get this working again, but someone in thr community would need to invest the time to investigate the problems and fix the code.

It would be best to start with just getting the lloop block device to work reliably, and use "lctl set_param debug=+malloc" to find allocations along the IO path, then move on to debugging swap.

Cheers, Andreas

> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John Hanks
> Sent: giovedì, 18. agosto 2011 05:55
> To: landman at scalableinformatics.com
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Swap over lustre
> 
> On Wed, Aug 17, 2011 at 8:57 PM, Joe Landman
> <landman at scalableinformatics.com> wrote:
>> On 08/17/2011 10:43 PM, John Hanks wrote:
>> As a rule of thumb, you should try to keep the path to swap as simple as
>> possible.  No memory/buffer allocations on the way to a paging event if
>> you can possibly do this.
> 
> I do have a long path there, will try simplifying that and see if it helps.
> 
>> The lustre client (and most NFS or even network block devices) all do
>> memory allocation of buffers ... which is anathema to migrating pages
>> out to disk.  You can easily wind up in a "death spiral" race condition
>> (and it sounds like you are there).  You might be able to do something
>> with iSCSI or SRP (though these also do block allocations and could
>> trigger death spirals).  If you can limit the number of buffers they
>> allocate, and then force them to allocate the buffers at startup (by
>> forcing some activity to the block device, and then pin this memory so
>> that they can't be ejected ...) you might have chance to do it as a
>> block device.  I think SRP can do this, not sure if iSCSI initiators can
>> pin buffers in ram.
>> 
>> You might look at the swapz patches (we haven't integrated them into our
>> kernel yet, but have been looking at it) to compress swap pages and
>> store them ... in ram.  This may not work for you, but it could be an
>> option.
> 
> I wasn't aware of swapz, that sounds really interesting. The codes
> that run the nodes out of memory tend to be sequencing applications,
> which seem like good candidates for memory compression.
> 
>> Is there any particular reason you can't use a local drive for this
>> (such as you don't have local drives, or they aren't big/fast enough)?
> 
> We're doing this on diskless nodes. I'm not looking to get a huge
> amount of swap, just enough to provide a place for the root filesystem
> to page out of the tmpfs so we can squeeze out all the RAM possible
> for applications. Since I don't expect it to get heavily used, I'm
> considering running vblade on a server and carving out small aoe LUNs.
> It seems logical that if a host can boot off of iscsi or aoe, that you
> could have a swap space there but I've never tried it with either
> protocol.
> 
> FWIW, mounting a file on lustre via loopback to provide a local
> scratch filesystem works really well.
> 
> jbh
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list