[lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures
Charles A Taylor
chasman at ufl.edu
Wed Nov 29 05:47:00 PST 2017
Thank you, Peter. I figured that would be the response but wanted to ask. We were hoping to get away from maintaining a MOFED build but it looks like that may not be the way to go.
And you are correct about the JIRA ticket. I misspoke. It was the associated RH kernel bug that was “private”, IIRC.
Thank you again,
Charlie
> On Nov 29, 2017, at 8:09 AM, Jones, Peter A <peter.a.jones at intel.com> wrote:
>
> Charles
>
> That ticket is completely open so you do have access to everything. As I understand it the options are to either use the latest MOFED update rather than relying on the in-kernel OFED (which I believe is the advise usually provided by Mellanox anyway) or else apply the kernel patch Andreas has created that is referenced in the ticket.
>
> Peter
>
> On 2017-11-29, 2:50 AM, "lustre-discuss on behalf of Charles A Taylor" <lustre-discuss-bounces at lists.lustre.org <mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of chasman at ufl.edu <mailto:chasman at ufl.edu>> wrote:
>
>>
>> Hi All,
>>
>> We recently upgraded from Lustre 2.5.3.90 on EL6 to 2.10.1 on EL7 (details below) but have hit what looks like LU-10133 (order 8 page allocation failures).
>>
>> We don’t have access to look at the JIRA ticket in more detail but from what we can tell the the fix is to change from vmalloc() to vmalloc_array() in the mlx4 drivers. However, the vmalloc_array() infrastructure is in an upstream (far upstream) kernel so I’m not sure when we’ll see that fix.
>>
>> While this may not be a Lustre issue directly, I know we can’t be the only Lustre site running 2.10.1 over IB on Mellanox ConnectX-3 HCAs. So far we have tried increasing vm.min_free_kbytes to 8GB but that does not help. Zone_reclaim_mode is disabled (for other reasons that may not be valid under EL7) but order 8 chunks get depleted on both NUMA nodes so I’m not sure that is the answer either (though we have not tried it yet).
>>
>> [root at ufrcmds1 ~]# cat /proc/buddyinfo
>> Node 0, zone DMA 1 0 0 0 2 1 1 0 1 1 3
>> Node 0, zone DMA32 1554 13496 11481 5108 150 0 0 0 0 0 0
>> Node 0, zone Normal 114119 208080 78468 35679 6215 690 0 0 0 0 0
>> Node 1, zone Normal 81295 184795 106942 38818 4485 293 1653 0 0 0 0
>>
>> I’m wondering if other sites are hitting this and, if so, what are you doing to work around the issue on your OSSs.
>>
>> Regards,
>>
>> Charles Taylor
>> UF Research Computing
>>
>>
>> Some Details:
>> -------------------
>> OS: RHEL 7.4 (Linux ufrcoss28.ufhpc 3.10.0-693.2.2.el7_lustre.x86_64)
>> Lustre: 2.10.1 (lustre-2.10.1-1.el7.x86_64)
>> Clients: ~1400 (still running 2.5.3.90 but we are in the process of upgrading)
>> Servers: 10 HA OSS pairs (20 OSSs)
>> 128 GB RAM
>> 6 OSTs (8+2 RAID-6) per OSS
>> Mellanox ConnectX-3 IB/VPI HCAs
>> RedHat Native IB Stack (i.e. not MOFED)
>> mlx4_core driver:
>> filename: /lib/modules/3.10.0-693.2.2.el7_lustre.x86_64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko.xz
>> version: 2.2-1
>> license: Dual BSD/GPL
>> description: Mellanox ConnectX HCA low-level driver
>> author: Roland Dreier
>> rhelversion: 7.4
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171129/10d1c93b/attachment.html>
More information about the lustre-discuss
mailing list