[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Wed Jan 29 11:00:07 PST 2025

>>>> That's why we'll
>>>> still support running latest Lustre on older distros. Specifically, it'll be the Lustre
>>>> code from a mainline kernel combined with our lustre_compat/ compatibility
>>>> code. So normal Lustre releases will be derived directly from the in-tree kernel
>>>> code. This provides a path for vendors to deploy bug fixes, custom features, and
>>>> allows users to optionally run the latest and greatest Lustre code.
>>>
>>> And OOPS. Both codes (in-kernel and out-of-tree) have a same sort of defines in config.h which have conflicts with building for out-of-free Lustre.
>>> Some examples for MOFED hacks to solve same problem you can see in the o2iblnd:
>>>>>>
>>> #if defined(EXTERNAL_OFED_BUILD) && !defined(HAVE_OFED_IB_DMA_MAP_SG_SANE)
>>> #undef CONFIG_INFINIBAND_VIRT_DMA
>>> #endif
>>>>>>
>>> As I remember this problem broke an ability to build a lustre as out-of-tree kernel on the ubuntu 18.06 with lustre in staging/.
>>
>> I think we should be able to validate the Lustre still builds as an
>> out-of-tree module by re-using a lot of the testing we already
>> do today in Jenkins/Maloo.
>
> Yes. Me do. But it needs many extra resources. Did Amazon ready to provide such HW resources for it?
> Or who will be pay for it? It’s cost of the moving to the kernel.

I suppose I disagree that this testing requires many extra
resources. This is just validate the same things we validate
today (i.e. that Lustre is functional on RHEL kernels). But the
build process looks different.

>> All we'd need to do it kick off test/build
>> sessions once the merge window closes. Based on the MOFED
>> example you gave, it seems like this is solvable.
>
> Sure, All can be solved. But what are cost for this and cost for support these changes?
> And next question - who will pay for this cost? Who will provide an HW for extra testing?
> So second face of “no cost for kernel API changes” - it will be a problems with back porting these changes and extra testing.

I don't think the backporting will be more burdensome
than porting Lustre to new kernels. And we don't have to
urgently backport each upstream release to older kernels.

>>>>
>>>> [1] Lustre changelog: https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD <https://git.whamcloud.com/?p=fs/lustre->release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD> <https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD> <https://git.whamcloud.com/?p=fs/lustre->release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD>>
>>>>
>>>>> It this is not enough - lets one more. Kernel API isn’t stable enough - so large number resources will be needs spent to solve each kernel change in lustre. Currently, it’s in the background and don’t interrupt primary work for supporting and development a new Lustre features.
>>>>>
>>>>> So that is problems for Lustre world - what is benefits?
>>>>
>>>> By upstreaming Lustre, we'll benefit from developers updating the kernel
>>>> API "for free".
>>> It’s not a “for free” - did you really think any of kernel developers have a cluster to run lustre client to test a changes?
>>> I think not, so testing will be “just compile with proposed/default config”.
>>> Once it will be lack of proper testing (don’t remember it’s full run for lustre test suite ~12-24h) - lustre developers needs review each change in the lustre code.
>>
>> That's why a put "for free" in quotes. We need to make it easier for
>> upstream developers to test their changes so they don't completely
>> break Lustre.
>
>Ah.. so Lustre will have a vote to stop any landing in kernel until Lustre testing will done?
>Did you understand how many tests will be needs to run?
>Full testing needs a ~24h of run time for single node.
>How many HW resources Amazon may share to run these tests?

We can't stop vendors from breaking Lustre with kernel updates
either. This seems to happen with some regularity in my
experience [1].

[1] Recent example with sockets: https://review.whamcloud.com/c/fs/lustre-release/+/56737

>Did you understand - if lustre code changed by someone in upstream that change can’t be backported to the main tree because compatibility code can’t be handle it.
>Sometimes needs to stay with old behavior which re-implemented with new kernel code.

I'm not sure what you mean. We can't backport a change
because compatibility code can’t handle it? So we have to
re-implement old behavior with compatibility code? Do you
have a specific example?

>> If we upstream the client and server concurrently, we
>> can implement xfstests support [1]. This would provide at least basic
>> validation. NFS does something similar. We could even copy over a
>> subset of Lustre specific tests from sanity.sh into xfstests.
>
> NFS server don’t have a many Lustre features and it don’t expect to be build as out-of-tree module for different kernels.
>
>> It's not perfect - but it'd be a much better situation compared to the
>> previous attempt in staging.
>>
>> [1] https://github.com/kdave/xfstests <https://github.com/kdave/xfstests>
>
> I’m sorry. This is very simple test cases. Lustre much complex FS.

Yeah, I know. But we can easily enough replicate "Test-Parameters: trivial"
with xfstests. It's something I plan to do. Ideally I'll be able to
draft up something before LSF.

>>> And it needs to back port all these changes in the out-of-free version. Once lustre part needs changes also.
>>> Best example is ‘folio’ - this need changes for both sides.
>>
>> If the out-of-tree version is derived from the in-tree version of
>> Lustre - I don't think the backporting will be that burdensome.
>> We're essentially do the same work now, but in reverse. Instead
>> of porting an upstream driver to old kernels, we are porting an
>> older driver to new kernels.
>
> Except some notes.
> 1) lustre release cycles. Now it’s not a refined with kernel one. None situation when senior developer should stop own work to review kernel changes because it might affects a lustre stability. But with lustre-in-kernel - any change in kernel affects a lustre - need reviewed / tested ungent.
> So extra developers positions/HW needs.

Changes to Lustre itself can be delayed (to some extent) until
reviewers have time to review. And if we provide some easy way
for developers to test their own changes, the demand on our
side to test everything will lessen, IMO.

> 2) no problem with have a custom patches in upstream.
> Someone may think something needs cleaned in the lustre code and this patch will accepted.
> So it generate a conflict in code changed in the same place for lustre main repository.
> Moving a whole lustre development in the kernel not possible because no server part, but servers have an “client” code on own side, sometimes.
>
> Not so small cost for “updates for free” ?

Ideally, both client and server will go upstream together. Then
we don't have to deal with client/server separation issues.

In another thread, you mention that Lustre is primarily used with
older kernels. While that's definitely true for many sectors, in my
experience - the demand for the latest kernel is robust and the
production usage of 6.x series kernels (with Lustre) is real. If no 
ne was using Lustre with up-to-date kernels - I'd be less enthusiastic
about upstreaming Lustre. But that's not the case.

Tim Day