[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Fri Jan 24 09:06:19 PST 2025

> On 1/22/25, 12:48 PM, "Alexey Lyahkov" <alexey.lyashkov at gmail.com <mailto:alexey.lyashkov at gmail.com>> wrote:
>> On 22 Jan 2025, at 20:17, Day, Timothy <timday at amazon.com <mailto:timday at amazon.com>> wrote:
>>> On 1/22/25, 6:14 AM, "Alexey Lyahkov" <alexey.lyashkov at gmail.com <mailto:alexey.lyashkov at gmail.com> <mailto:alexey.lyashkov at gmail.com <mailto:alexey.lyashkov at gmail.com>>> wrote:
>>>
>>> Timothy,
>>>
>>>> 22 янв. 2025 г., в 09:35, Day, Timothy <timday at amazon.com <mailto:timday at amazon.com> <mailto:timday at amazon.com <mailto:timday at amazon.com>>> написал(а):
>>>>
>>>> I've created a second draft of the topic for LSF/MM. I tried
>>>> to include everyone's feedback. It's at the end of the email.
>>>>
>>>> Before that, I wanted to elaborate on Neil's idea about updating
>>>> our development model to an upstream-focused model. For upstreaming
>>>> to work, the normal development flow has to generate patches to mainline
>>>> Linux - while still supporting the distro kernels that most people use
>>>> to run Lustre. I think we can get this point in stages. I've provided
>>>> a high-level overview in the next section. This won't be without
>>>> challenges - but the majority of the transition could happen without
>>>> interrupting feature work or normal development.
>>>>
>>>
>>> Can you explain how Lustre platform fragmentation will avoid ?
>>>
>>>
>>> I posted example early,
>>> Distro have locked a Lustre version in release time. But Lustre server have a limited compatibility - in most cases +/- 1…2 releases guaratee to be connected. So stale and aged client will live in the distribution kernel. And it will don’t work for modern servers.
>>> it’s very easy Once distribution live time ~8y. So clients will be needs to drop in kernel lustre client support and install a lustre client from an external sources. Which have no differences with current state.
>>> Next step is sort of distributions which have a different lustre versions which not compatible each to other.
>>> Both these increase a support cost - once large number versions needs supported, so development will drops and all time will spent to support.
>>
>> I think that's a reasonable concern. I spend a lot of time doing customer
>> support for Lustre; I definitely don't want to make that part of my job any
>> harder than it has to be.
>>
>> I'm my personal experience, I've seen 2.10 and 2.15 interoperate well together.
>> That covers a gap of around ~6 years at least. If someone stuck with RHEL7, the
>> first client they could use is 2.7.0 and the last client they could use is 2.16.0 [1].
>> So if a customer didn't update either their distro or filesystem, they could use an
>> up-to-date Lustre version for around 10 years covering 9 versions. So I think these
>> large version gaps are possible today.
>
> Customer expect to update an server side part, but it not always true for client side part.
> They expect to stick for RHEL7 version until EOL, because old HW can don’t support with new version.
> (Look to the RHEL HW support reduction between releases. RHE7->RHEL8 many raid cards had dropped from support).
>
>
>> There is an issue if distros don't want to update their clients.
>
> It is not “if don’t want update”, Ubuntu don’t update own lustre code in past.
> I don’t expect it will be changed. Because distro owner will needs to hire more developers to have extra support.
> But have no money from it.
>
>
>
>
>> That's why we'll
>> still support running latest Lustre on older distros. Specifically, it'll be the Lustre
>> code from a mainline kernel combined with our lustre_compat/ compatibility
>> code. So normal Lustre releases will be derived directly from the in-tree kernel
>> code. This provides a path for vendors to deploy bug fixes, custom features, and
>> allows users to optionally run the latest and greatest Lustre code.
>
>And OOPS. Both codes (in-kernel and out-of-tree) have a same sort of defines in config.h which have conflicts with building for out-of-free Lustre.
>Some examples for MOFED hacks to solve same problem you can see in the o2iblnd:
>>>>
>#if defined(EXTERNAL_OFED_BUILD) && !defined(HAVE_OFED_IB_DMA_MAP_SG_SANE)
>#undef CONFIG_INFINIBAND_VIRT_DMA
>#endif
>>>>
>As I remember this problem broke an ability to build a lustre as out-of-tree kernel on the ubuntu 18.06 with lustre in staging/.

I think we should be able to validate the Lustre still builds as an
out-of-tree module by re-using a lot of the testing we already
do today in Jenkins/Maloo. All we'd need to do it kick off test/build
sessions once the merge window closes. Based on the MOFED
example you gave, it seems like this is solvable.

>>
>> [1] Lustre changelog: https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD <https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/ChangeLog;hb=HEAD>
>>
>>> It this is not enough - lets one more. Kernel API isn’t stable enough - so large number resources will be needs spent to solve each kernel change in lustre. Currently, it’s in the background and don’t interrupt primary work for supporting and development a new Lustre features.
>>>
>>> So that is problems for Lustre world - what is benefits?
>>
>> By upstreaming Lustre, we'll benefit from developers updating the kernel
>> API "for free".
> It’s not a “for free” - did you really think any of kernel developers have a cluster to run lustre client to test a changes?
> I think not, so testing will be “just compile with proposed/default config”.
> Once it will be lack of proper testing (don’t remember it’s full run for lustre test suite ~12-24h) - lustre developers needs review each change in the lustre code.

That's why a put "for free" in quotes. We need to make it easier for
upstream developers to test their changes so they don't completely
break Lustre. If we upstream the client and server concurrently, we
can implement xfstests support [1]. This would provide at least basic
validation. NFS does something similar. We could even copy over a
subset of Lustre specific tests from sanity.sh into xfstests.

It's not perfect - but it'd be a much better situation compared to the
previous attempt in staging.

[1] https://github.com/kdave/xfstests

> And it needs to back port all these changes in the out-of-free version. Once lustre part needs changes also.
> Best example is ‘folio’ - this need changes for both sides.

If the out-of-tree version is derived from the in-tree version of
Lustre - I don't think the backporting will be that burdensome.
We're essentially do the same work now, but in reverse. Instead
of porting an upstream driver to old kernels, we are porting an
older driver to new kernels.

>> We Lustre was in staging/, there wasn't as much obligation
>> to keep Lustre in a working state. But if we get Lustre merged properly, 
>> developer will not be able to merge changes that break Lustre. So we'll
>> get support for the latest and greatest kernels with less effort. That's one
>> of the main benefits of this effort.
>>
>>
>> We also get benefit from more say over the future of the kernel. A lot
>> of difficulty with updating Lustre for new kernels comes when upstream
>> kernel developers lock down symbols or features to in-tree modules. This
>> could get even worse in the future, with stuff like symbol namespaces get
>> more use [2].
>>
>> Even if most users use the out-of-tree backported-from-mainline-Linux
>> Lustre release, I think we'll still be in a stronger position after
>> upstreaming.
>>
>> [2] https://lwn.net/Articles/760045/ <https://lwn.net/Articles/760045/>
>>

>
> PS. Lustre able to run a server with very very light modified ext4 code. Mostly some exports / callbacks from core.
>

That's good to hear - I think that'll make it easier to convince
upstream to accept the ext4 patches needed to run the server.

>
> Alex
>

Tim Day