[lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures
Jones, Peter A
peter.a.jones at intel.com
Wed Nov 29 05:51:50 PST 2017
Ah yes. One more thing – I believe that this has been addressed in the upcoming RHEL 7.5, so that might be another option for you to consider.
On 2017-11-29, 5:47 AM, "lustre-discuss on behalf of Charles A Taylor" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of chasman at ufl.edu<mailto:chasman at ufl.edu>> wrote:
Thank you, Peter. I figured that would be the response but wanted to ask. We were hoping to get away from maintaining a MOFED build but it looks like that may not be the way to go.
And you are correct about the JIRA ticket. I misspoke. It was the associated RH kernel bug that was “private”, IIRC.
Thank you again,
On Nov 29, 2017, at 8:09 AM, Jones, Peter A <peter.a.jones at intel.com<mailto:peter.a.jones at intel.com>> wrote:
That ticket is completely open so you do have access to everything. As I understand it the options are to either use the latest MOFED update rather than relying on the in-kernel OFED (which I believe is the advise usually provided by Mellanox anyway) or else apply the kernel patch Andreas has created that is referenced in the ticket.
On 2017-11-29, 2:50 AM, "lustre-discuss on behalf of Charles A Taylor" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of chasman at ufl.edu<mailto:chasman at ufl.edu>> wrote:
We recently upgraded from Lustre 22.214.171.124 on EL6 to 2.10.1 on EL7 (details below) but have hit what looks like LU-10133 (order 8 page allocation failures).
We don’t have access to look at the JIRA ticket in more detail but from what we can tell the the fix is to change from vmalloc() to vmalloc_array() in the mlx4 drivers. However, the vmalloc_array() infrastructure is in an upstream (far upstream) kernel so I’m not sure when we’ll see that fix.
While this may not be a Lustre issue directly, I know we can’t be the only Lustre site running 2.10.1 over IB on Mellanox ConnectX-3 HCAs. So far we have tried increasing vm.min_free_kbytes to 8GB but that does not help. Zone_reclaim_mode is disabled (for other reasons that may not be valid under EL7) but order 8 chunks get depleted on both NUMA nodes so I’m not sure that is the answer either (though we have not tried it yet).
[root at ufrcmds1 ~]# cat /proc/buddyinfo
Node 0, zone DMA 1 0 0 0 2 1 1 0 1 1 3
Node 0, zone DMA32 1554 13496 11481 5108 150 0 0 0 0 0 0
Node 0, zone Normal 114119 208080 78468 35679 6215 690 0 0 0 0 0
Node 1, zone Normal 81295 184795 106942 38818 4485 293 1653 0 0 0 0
I’m wondering if other sites are hitting this and, if so, what are you doing to work around the issue on your OSSs.
UF Research Computing
OS: RHEL 7.4 (Linux ufrcoss28.ufhpc 3.10.0-693.2.2.el7_lustre.x86_64)
Lustre: 2.10.1 (lustre-2.10.1-1.el7.x86_64)
Clients: ~1400 (still running 126.96.36.199 but we are in the process of upgrading)
Servers: 10 HA OSS pairs (20 OSSs)
128 GB RAM
6 OSTs (8+2 RAID-6) per OSS
Mellanox ConnectX-3 IB/VPI HCAs
RedHat Native IB Stack (i.e. not MOFED)
license: Dual BSD/GPL
description: Mellanox ConnectX HCA low-level driver
author: Roland Dreier
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-discuss