[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Wed Jan 29 11:32:47 PST 2025


> On 29 Jan 2025, at 22:00, Day, Timothy <timday at amazon.com> wrote:
> 
>>>>> That's why we'll
>>>>> still support running latest Lustre on older distros. Specifically, it'll be the Lustre
>>>>> code from a mainline kernel combined with our lustre_compat/ compatibility
>>>>> code. So normal Lustre releases will be derived directly from the in-tree kernel
>>>>> code. This provides a path for vendors to deploy bug fixes, custom features, and
>>>>> allows users to optionally run the latest and greatest Lustre code.
>>>> 
>>>> And OOPS. Both codes (in-kernel and out-of-tree) have a same sort of defines in config.h which have conflicts with building for out-of-free Lustre.
>>>> Some examples for MOFED hacks to solve same problem you can see in the o2iblnd:
>>>>>>> 
>>>> #if defined(EXTERNAL_OFED_BUILD) && !defined(HAVE_OFED_IB_DMA_MAP_SG_SANE)
>>>> #undef CONFIG_INFINIBAND_VIRT_DMA
>>>> #endif
>>>>>>> 
>>>> As I remember this problem broke an ability to build a lustre as out-of-tree kernel on the ubuntu 18.06 with lustre in staging/.
>>> 
>>> I think we should be able to validate the Lustre still builds as an
>>> out-of-tree module by re-using a lot of the testing we already
>>> do today in Jenkins/Maloo.
>> 
>> Yes. Me do. But it needs many extra resources. Did Amazon ready to provide such HW resources for it?
>> Or who will be pay for it? It’s cost of the moving to the kernel.
> 
> I suppose I disagree that this testing requires many extra
> resources. This is just validate the same things we validate
> today (i.e. that Lustre is functional on RHEL kernels). But the
> build process looks different.
> 
Ah. So you don’t expect to do any performance testing?
Performance testing needs a 20 nodes cluster with IB HDR network (400G) and E1000 with NVMe drives as minimal. 
Otherwise servers / network will be bottleneck.
And week or so for load to be sure no regression exist. Some problems can found just with 48h continuas load.
And that is minimal performance testing.
I don’t say about scale testing with 100+ client nodes.
You think we needs to drop it? If no - who will provide HW for such testing.


>>> All we'd need to do it kick off test/build
>>> sessions once the merge window closes. Based on the MOFED
>>> example you gave, it seems like this is solvable.
>> 
>> Sure, All can be solved. But what are cost for this and cost for support these changes?
>> And next question - who will pay for this cost? Who will provide an HW for extra testing?
>> So second face of “no cost for kernel API changes” - it will be a problems with back porting these changes and extra testing.
> 
> I don't think the backporting will be more burdensome
> than porting Lustre to new kernels. And we don't have to
> urgently backport each upstream release to older kernels.
Neil B, say we needs to move all development to the mainstream. It’s mean kernel upstream will be same as ‘master’ branch now.
So each change needs to be back ported to older kernels to sync with servers work and make ready for lustre release.
Otherwise we will have a ton changes needs to be backported on each lustre release.
I see no differences with porting to upstream, except this porting from mainline to old kernels should be handled ASAP do avoid lustre release delay, while porting to the mainstream may delayed as it not critical for customers.

> 
>>>>> 
>>>>> [1] Lustre changelog: https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD <https://git.whamcloud.com/?p=fs/lustre->release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD> <https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD> <https://git.whamcloud.com/?p=fs/lustre->release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD>>
>>>>> 
>>>>>> It this is not enough - lets one more. Kernel API isn’t stable enough - so large number resources will be needs spent to solve each kernel change in lustre. Currently, it’s in the background and don’t interrupt primary work for supporting and development a new Lustre features.
>>>>>> 
>>>>>> So that is problems for Lustre world - what is benefits?
>>>>> 
>>>>> By upstreaming Lustre, we'll benefit from developers updating the kernel
>>>>> API "for free".
>>>> It’s not a “for free” - did you really think any of kernel developers have a cluster to run lustre client to test a changes?
>>>> I think not, so testing will be “just compile with proposed/default config”.
>>>> Once it will be lack of proper testing (don’t remember it’s full run for lustre test suite ~12-24h) - lustre developers needs review each change in the lustre code.
>>> 
>>> That's why a put "for free" in quotes. We need to make it easier for
>>> upstream developers to test their changes so they don't completely
>>> break Lustre.
>> 
>> Ah.. so Lustre will have a vote to stop any landing in kernel until Lustre testing will done?
>> Did you understand how many tests will be needs to run?
>> Full testing needs a ~24h of run time for single node.
>> How many HW resources Amazon may share to run these tests?
> 
> We can't stop vendors from breaking Lustre with kernel updates
> either. This seems to happen with some regularity in my
> experience [1].
> 
> [1] Recent example with sockets: https://review.whamcloud.com/c/fs/lustre-release/+/56737
> 
Kernel updates does much rare than updates in the kernel mainline. And much controlled.


>> Did you understand - if lustre code changed by someone in upstream that change can’t be backported to the main tree because compatibility code can’t be handle it.
>> Sometimes needs to stay with old behavior which re-implemented with new kernel code.
> 
> I'm not sure what you mean. We can't backport a change
> because compatibility code can’t handle it? So we have to
> re-implement old behavior with compatibility code? Do you
> have a specific example?
> 
>>> If we upstream the client and server concurrently, we
>>> can implement xfstests support [1]. This would provide at least basic
>>> validation. NFS does something similar. We could even copy over a
>>> subset of Lustre specific tests from sanity.sh into xfstests.
>> 
>> NFS server don’t have a many Lustre features and it don’t expect to be build as out-of-tree module for different kernels.
>> 
>>> It's not perfect - but it'd be a much better situation compared to the
>>> previous attempt in staging.
>>> 
>>> [1] https://github.com/kdave/xfstests <https://github.com/kdave/xfstests>
>> 
>> I’m sorry. This is very simple test cases. Lustre much complex FS.
> 
> Yeah, I know. But we can easily enough replicate "Test-Parameters: trivial"
> with xfstests. It's something I plan to do. Ideally I'll be able to
> draft up something before LSF.
> 
And kill a lustre code quality completely due remove large amount of testing.
Did you know "Test-Parameters: trivial” should be don’t used except complete time changes?
I looks you really think to kill a lustre and add more and more problems with creating a good product.


>>>> And it needs to back port all these changes in the out-of-free version. Once lustre part needs changes also.
>>>> Best example is ‘folio’ - this need changes for both sides.
>>> 
>>> If the out-of-tree version is derived from the in-tree version of
>>> Lustre - I don't think the backporting will be that burdensome.
>>> We're essentially do the same work now, but in reverse. Instead
>>> of porting an upstream driver to old kernels, we are porting an
>>> older driver to new kernels.
>> 
>> Except some notes.
>> 1) lustre release cycles. Now it’s not a refined with kernel one. None situation when senior developer should stop own work to review kernel changes because it might affects a lustre stability. But with lustre-in-kernel - any change in kernel affects a lustre - need reviewed / tested ungent.
>> So extra developers positions/HW needs.
> 
> Changes to Lustre itself can be delayed (to some extent) until
> reviewers have time to review. And if we provide some easy way
> for developers to test their own changes, the demand on our
> side to test everything will lessen, IMO.
So Lustre customers should be wait. Lustre needs to take a more HW for testing. OK. What is benefit ?

> 
>> 2) no problem with have a custom patches in upstream.
>> Someone may think something needs cleaned in the lustre code and this patch will accepted.
>> So it generate a conflict in code changed in the same place for lustre main repository.
>> Moving a whole lustre development in the kernel not possible because no server part, but servers have an “client” code on own side, sometimes.
>> 
>> Not so small cost for “updates for free” ?
> 
> Ideally, both client and server will go upstream together. Then
> we don't have to deal with client/server separation issues.
> 
> In another thread, you mention that Lustre is primarily used with
> older kernels. While that's definitely true for many sectors, in my
> experience - the demand for the latest kernel is robust and the
> production usage of 6.x series kernels (with Lustre) is real.
I may say different - 6.x kernels used on very small installations where it might failed or not.. fail not a critical.
But systems from TOP500 and from some Oil companies uses an RHEL kernel.


Alex