[lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

Wed Nov 29 05:47:00 PST 2017

Thank you, Peter.  I figured that would be the response but wanted to ask.  We were hoping to get away from maintaining a MOFED build but it looks like that may not be the way to go.

And you are correct about the JIRA ticket.  I misspoke.  It was the associated RH kernel bug that was “private”, IIRC.  

Thank you again,

Charlie

> On Nov 29, 2017, at 8:09 AM, Jones, Peter A <peter.a.jones at intel.com> wrote:
> 
> Charles
> 
> That ticket is completely open so you do have access to everything. As I understand it the options are to either use the latest MOFED update rather than relying on the in-kernel OFED (which I believe is the advise usually provided by Mellanox anyway) or else apply the kernel patch Andreas has created that is referenced in the ticket.
> 
> Peter
> 
> On 2017-11-29, 2:50 AM, "lustre-discuss on behalf of Charles A Taylor" <lustre-discuss-bounces at lists.lustre.org <mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of chasman at ufl.edu <mailto:chasman at ufl.edu>> wrote:
> 
>> 
>> Hi All,
>> 
>> We recently upgraded from Lustre 2.5.3.90 on EL6 to 2.10.1 on EL7 (details below) but have hit what looks like LU-10133 (order 8 page allocation failures).
>> 
>> We don’t have access to look at the JIRA ticket in more detail but from what we can tell the the fix is to change from vmalloc() to vmalloc_array() in the mlx4 drivers.  However, the vmalloc_array() infrastructure is in an upstream (far upstream) kernel so I’m not sure when we’ll see that fix.
>> 
>> While this may not be a Lustre issue directly, I know we can’t be the only Lustre site running 2.10.1 over IB on Mellanox ConnectX-3 HCAs.  So far we have tried increasing vm.min_free_kbytes to 8GB but that does not help.  Zone_reclaim_mode is disabled (for other reasons that may not be valid under EL7) but order 8 chunks get depleted on both NUMA nodes so I’m not sure that is the answer either (though we have not tried it yet).
>> 
>> [root at ufrcmds1 ~]# cat /proc/buddyinfo 
>> Node 0, zone      DMA      1      0      0      0      2      1      1      0      1      1      3 
>> Node 0, zone    DMA32   1554  13496  11481   5108    150      0      0      0      0      0      0 
>> Node 0, zone   Normal 114119 208080  78468  35679   6215    690      0      0      0      0      0 
>> Node 1, zone   Normal  81295 184795 106942  38818   4485    293   1653      0      0      0      0 
>> 
>> I’m wondering if other sites are hitting this and, if so, what are you doing to work around the issue on your OSSs.  
>> 
>> Regards,
>> 
>> Charles Taylor
>> UF Research Computing
>> 
>> 
>> Some Details:
>> -------------------
>> OS: RHEL 7.4 (Linux ufrcoss28.ufhpc 3.10.0-693.2.2.el7_lustre.x86_64)
>> Lustre: 2.10.1 (lustre-2.10.1-1.el7.x86_64)
>> Clients: ~1400 (still running 2.5.3.90 but we are in the process of upgrading)
>> Servers: 10 HA OSS pairs (20 OSSs)
>>    128 GB RAM
>>    6 OSTs (8+2 RAID-6) per OSS 
>>    Mellanox ConnectX-3 IB/VPI HCAs 
>>    RedHat Native IB Stack (i.e. not MOFED)
>>    mlx4_core driver:
>>       filename:       /lib/modules/3.10.0-693.2.2.el7_lustre.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko.xz
>>       version:        2.2-1
>>       license:        Dual BSD/GPL
>>       description:    Mellanox ConnectX HCA low-level driver
>>       author:         Roland Dreier
>>       rhelversion:    7.4

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171129/10d1c93b/attachment.html>