[lustre-devel] LIBCFS_ALLOC

Dilger, Andreas andreas.dilger at intel.com
Fri Jul 3 04:52:06 PDT 2015

On 2015/07/02, 4:25 PM, "Simmons, James A." <simmonsja at ornl.gov> wrote:

>>> >Yeah.  You're right.  Doing a vmalloc() when kmalloc() doesn't have
>>> >a tiny sliver of RAM isn't going to work.  It's easier to use
>>> >libcfs_kvzalloc() everywhere, but it's probably the wrong thing.
>>> The original  reason we have the vmalloc water mark wasn't so much the
>>> issue of memory exhaustion but to handle the case of memory
>>> Some sites had after a extended period of time started to see failures
>>> allocating even 32K using kmalloc.  In our latest development branch
>>>we moved
>>> away from using a water mark to always try kmalloc first and if it
>>>fails then we
>>> try vmalloc. At ORNL we ran into severe performance issues when we
>>> vmalloc territory. It has been discussed before on what might replace
>>> handling in the case of kmalloc fails but no solution has been worked
>>OK, but if a structure contains only 4 words, would it be better to just
>>use kzalloc?  Or does it not matter?  It would only save trying vmalloc
>>a case that it is guaranteed to fail, but if a structure with 4 words
>>can't be allocated, the system has other problems.  Another argument is
>>that kzalloc is a well known function that people and bug-finding tools
>>understand, so it is better to use it whenever possible.
>>Some of the other structures contain a lot more fields, as well as small
>>arrays.  They are probably acceptable for kzalloc too, but I wouldn't
>>the exact dividing line.
>The reason I bring this up is to discuss sorting this out. Once long ago
>we had just LIBCFS_ALLOC. For some reason before my time OBD_ALLOC got
> spawned off of that.  Currently LIBCFS_ALLOC is used just by the
> layer.

That is because there was (is?) interest from Cray and others to use LNet
independently from Lustre (Zest and DVS, for example) so LNet should be
self-contained and not depend on anything from Lustre.

> Now OBD_ALLOC in our development branch has moved to a try kmalloc first
>if it fails try vmalloc for any size memory allocation.  LIBCFS_ALLOC
> does the original approach.   So we have two possible solutions
>depending on if libcfs/LNet needs to ever do a vmalloc.
>One solution if libcfs/LNet never needs a vmalloc is remove LIBCFS_ALLOC
>and replace it with kzalloc everywhere. We can then move libcfs_kzvalloc
> the lustre layer and port the change we did in the development branch to
> here of the try kmalloc then vmalloc approach.
>The other approach is if libcfs/LNet does in some case need to use vmalloc
> we could then update LIBCFS_ALLOC to first try kmalloc then vmalloc. Once
> this is implemented we can nuke the OBD_ALLOC system.

I don't agree.  I think there are a few places where vmalloc() makes sense
to try (if the allocation may be large), but in most places LIBCFS_ALLOC()
should only use kmalloc().

Unfortunately, there wasn't a separate LIBCFS_ALLOC_LARGE() like there was
for OBD_ALLOC_LARGE() that made it clear which callsites are (potentially)
large and which are small.  The macro approach allowed the compile-time
optimization of the small callsites, but that needs to be done by hand now.

>Either way I like to see it consolidated down to one system.

Given the proliferation of foo_kvmalloc() and foo_kvzalloc() helpers
(ext4_, kvm_, dm_, apparmor, ceph_, __aa_), maybe it it is time to move
these to common kernel code instead of introducing yet another new one?

Cheers, Andreas
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

More information about the lustre-devel mailing list