[Lustre-discuss] Swap over lustre

Joe Landman landman at scalableinformatics.com
Wed Aug 17 19:57:40 PDT 2011

On 08/17/2011 10:43 PM, John Hanks wrote:
> Hi,
> I've been trying to get swap on lustre to work with not much success
> using blockdev_attach and the resulting lloop0 device and using
> losetup and the resulting loop device. This thread
> (http://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg00856.html)
> claims that it works, but in all my attempts almost as soon as swap is
> used (testing with memhog), the host hangs. In some cases it hangs
> hard, but on occasion if I'm patient enough the OOM will eventually
> kill something and the node will become responsive again. If I
> carefully increase memory with each successive memhog run I can get
> some pages to swap, but any real pressure always results in a hang.
> I'm attempting this on Redhat EL 5.6 with lustre 1.8.4 patchless
> client over IB.

As a rule of thumb, you should try to keep the path to swap as simple as 
possible.  No memory/buffer allocations on the way to a paging event if 
you can possibly do this.

The lustre client (and most NFS or even network block devices) all do 
memory allocation of buffers ... which is anathema to migrating pages 
out to disk.  You can easily wind up in a "death spiral" race condition 
(and it sounds like you are there).  You might be able to do something 
with iSCSI or SRP (though these also do block allocations and could 
trigger death spirals).  If you can limit the number of buffers they 
allocate, and then force them to allocate the buffers at startup (by 
forcing some activity to the block device, and then pin this memory so 
that they can't be ejected ...) you might have chance to do it as a 
block device.  I think SRP can do this, not sure if iSCSI initiators can 
pin buffers in ram.

You might look at the swapz patches (we haven't integrated them into our 
kernel yet, but have been looking at it) to compress swap pages and 
store them ... in ram.  This may not work for you, but it could be an 

Is there any particular reason you can't use a local drive for this 
(such as you don't have local drives, or they aren't big/fast enough)?

> DIgging around search results for "swap over NFS" I've found a lot fo
> discussion about race conditions and different patches to address
> this, but CONFIG_NFS_SWAP seems to be missing from the redhat kernel.
> And upon trying swap to an NFS server, I see the same behavior. Is
> swap to a network device doomed to always fail on Redhat EL 5 and if
> not, does anyone have a recipe for getting swap on lustre to work?
> I've also fiddled with min_free_kbytes and swappiness in an attempt to
> induce swapping before the node's memory is actually all gone but all
> this results in is an earlier hang with less memory having been used.

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the lustre-discuss mailing list