[lustre-discuss] FSx for Lustre Client 2.12 very slow compared to 2.10 / 2.15

Anton Wijbenga anton.wijbenga at maptm.nl
Fri Aug 30 06:14:19 PDT 2024


Dear Lustre community,

We have a FSx for Lustre configuration at AWS (FSx Persistant SSD 1.2TB 250mb/s/TiB; FSx for Lustre server version 2.12). From an EC2 with Ubuntu that file system is mounted using the Lustre Client Modules. The version of these lustre clients depends on the Linux kernel.

On the file system we have compressed data tables (80000 row x 1440 col and transposed as well). These are stored using the fst library/package for R (https://www.fstpackage.org/). The data are stored columnwise in a serialised manner. The benefit is, one can read a set of columns (or rows) without having to read the whole file (similar to Parquet files). It is (one of) the fastest way to read/write data from R.

Recently we discovered that on a newly configured EC2 reading data from these files on the Lustre file system is a lot slower than on an older EC2.
After some debugging it was found that an EC2 with Ubuntu 18.04.6 LTS and kernel 5.4.0-1083-aws using the Lustre Client 2.10.8 has the same fast performance as expected (same as the older EC2). However, upgrading the Lustre Client to 2.12.8 (nothing else is different... same machine) results in poor performance. The job to test the speed is reading 4 columns from 180 files containing a data table as described above. This takes about 5 seconds when it is fast, but slows down to 20 seconds in the slow case.

In addition, just reading one whole table (one file) using read_fst takes about: 1 - 2 seconds with Lustre Client 2.10.8 20 - 22 seconds with Lustre Client 2.12.8 1-2 seconds with Lustre Client 2.15.4 (on Ubuntu 22... another EC2).

Reading the files immediately again using the Lustre Client 2.12.8 improves the performance back to 1 - 2 seconds. So, when they are cached (somewhere), the performance is OK, but a cold read is very slow. In contrast, the other two (2.10.8 and 2.15.4) are already very fast reading the files the first time.

I would use 2.15.4 which is the latest supported version using the highest supported Ubuntu version (22), but unfortunately the performance of 2.15.4 is similar to that of 2.12.8 in the first test (reading 4 columns from 180 files takes about 20 seconds instead of around 5). As a result, we're stuck with Ubuntu 18 and kernel 5.4.0 which is the latest combination that still supports Lustre Client 2.10.8 (which is fast in all cases).

The test has been repeated a lot of times at different times to rule out caching behaviour.

What could be the reason for these large performance differences (4x to 10x slower)? Are there perhaps some parameter settings different between the Lustre Client versions? Can those be adjusted?

I also posted this question on the AWS re:Post “forum”: https://repost.aws/questions/QUCiF-XpFaS0al162IYKXg1w/fsx-for-lustre-client-2-12-very-slow-compared-to-2-10

Kind regards,

Anton Wijbenga

[A white and orange logo  Description automatically generated]
[cid:image002.png at 01DAFAEF.4B4DD250]
+31 6 14 86 86 67<+31614868667>
[cid:image003.png at 01DAFAEF.4B4DD250]
Van Deventerlaan 20<https://maps.app.goo.gl/uLNCBe5z6FQVMLfK6>, 3528 AE, Utrecht<https://maps.app.goo.gl/uLNCBe5z6FQVMLfK6>
[cid:image004.png at 01DAFAEF.4B4DD250]
/anton-wijbenga<https://www.linkedin.com/in/anton-wijbenga-b9798623/>
[cid:image005.png at 01DAFAEF.4B4DD250]
https://www.maptm.nl<https://www.maptm.nl/>
[cid:image006.png at 01DAFAEF.4B4DD250]
anton.wijbenga at maptm.nl<mailto:anton.wijbenga at maptm.nl>
Not available on wednesdays


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 8962 bytes
Desc: image001.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 607 bytes
Desc: image002.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 793 bytes
Desc: image003.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 575 bytes
Desc: image004.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 1205 bytes
Desc: image005.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 543 bytes
Desc: image006.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240830/9dab134a/attachment-0011.png>


More information about the lustre-discuss mailing list