[lustre-discuss] Status of LU-8703 for Knights Landing

Patrick Farrell paf at cray.com
Wed Feb 1 13:27:08 PST 2017


Are they really just not working?  I didn't see that with KNL (the default CPT generated without the fixes from LU-8703 is very weird, but didn't affect performance much - the real NUMA-ness of KNL processors seems to be minimal, despite the various NUMA related configuration options...), but Cray systems are unusual and I don't think I ever saw an empty NUMA node (possibly something we fix in the BIOS).  Anyway, you should be able to work around this without patching your client, just set some module parameters before starting Lustre/loading the modules.

I can think of two things which should work, both are module parameters for the libcfs module, I believe.  I haven't tried this, so it's possible your error is coming earlier in the loading process...  But I think not, based on the message.

1. Limit yourself to 1 partition, by setting cpu_npartitions to 1.
static int cpu_npartitions;
module_param(cpu_npartitions, int, 0444);
MODULE_PARM_DESC(cpu_npartitions, "# of CPU partitions");

2. Or, you could draw up a CPU partition table yourself.  Parameter name is cpu_pattern.

Here's the code describing that:

 * modparam for setting CPU partitions patterns:
 * i.e: "0[0,1,2,3] 1[4,5,6,7]", number before bracket is CPU partition ID,
 *      number in bracket is processor ID (core or HT)
 * i.e: "N 0[0,1] 1[2,3]" the first character 'N' means numbers in bracket
 *       are NUMA node ID, number before bracket is CPU partition ID.
 * i.e: "N", shortcut expression to create CPT from NUMA & CPU topology
 * NB: If user specified cpu_pattern, cpu_npartitions will be ignored
static char *cpu_pattern = "N";
module_param(cpu_pattern, charp, 0444);
MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");"

Notice the default pattern is N, but you can override it.

(Code references from libcfs/libcfs/linux/linux-cpu.c in Lustre.)

Either of those should let you get past the error, no need to carry patches.  I can't speak to the production-readiness of the patches, but I'd definitely go the module parameter route if it were my system.

- Patrick

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Prout, Andrew - LLSC - MITLL <aprout at ll.mit.edu>
Sent: Wednesday, February 1, 2017 3:11:07 PM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Status of LU-8703 for Knights Landing

Anyone know the production-readiness of the patches attached to LU-8703 to fix issues with Lustre on Xeon Phi Knights Landing hardware? We're considering merging them against our 2.9.0 client to get it working on our KL nodes.

Andrew Prout
Lincoln Laboratory Supercomputing Center
MIT Lincoln Laboratory
244 Wood Street, Lexington, MA 02420
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170201/f5158508/attachment.htm>

More information about the lustre-discuss mailing list