[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Sun Jan 19 20:38:11 PST 2025

> On 1/19/25, 3:46 PM, "Oleg Drokin" <green at whamcloud.com <mailto:green at whamcloud.com>> wrote:
> > On Sat, 2025-01-18 at 21:46 +0000, Day, Timothy wrote:
> >
> >
> > > On 1/17/25, 10:17 PM, "Oleg Drokin"
> > > <green at whamcloud.com <mailto:green at whamcloud.com> <mailto:green at whamcloud.com <mailto:green at whamcloud.com>>> wrote:
> > > > On Sat, 2025-01-18 at 11:45 +1100, NeilBrown wrote:
> > > > We need to demonstrate a process for, and commitment to, moving
> > > > away
> > > > from the dual-tree model. We need patches to those parts of
> > > > Lustre
> > > > that are upstream to land in upstream first (mostly).
> > >
> > >
> > > I think this is not very realistic.
> > > Large chunk (100%?) of users do not run not only the latest kernel
> > > release, they don't run the latest LTS either.
> > >
> > >
> > > When we were in staging last this manifested in random patches
> > > being
> > > landed and breaking the client completely and nobody noticing for
> > > months.
> > >
> > >
> > > Of course some automatic infrastructure could be built up to make
> > > it
> > > somewhat better, but it does not remove the problem of "nobody
> > > would
> > > run this mainline tree", I am afraid.
> >
> > I think there's a decent chunk of users on newer kernels. Ubuntu
> > 22/24 is
> > on (a bit past latest) LTS 6.8 kernel [1], AL2023 is on previous LTS
> > 6.1 [2], and
> > working on upcoming LTS 6.12 [3].
>
>
> Well, I mostly mean in context of Lustre client use and sure there's
> some 6.8 LTS in use on those ubuntu clients, though I cannot assess the
> real numbers, majority of reports I see are still on 5.x even on
> Ubuntu.

Yeah, I'm not sure of the real numbers. It's just my personal experience
that newer kernels are getting a lot of traction.

> > When a patch lands in lustre-release/master, it could be around 1 -
> > 1.5 years
> > before it lands in a proper Lustre release. At that point, it might
> > see real
> > production usage.
>
>
> Well, not really.
> I guess it might not be seen as easily from the outside, but "lustre-
> release/master" patches are backports from "true production" branches.
> the number approaches 100% for features, but even a sizable number of
> fixes are backports.
> In particular anything that comes from HPE are backports, they run
> their production stuff, sometimes hit problems, create fixes, and the
> eventually determine that the problem is present in master as well (or
> sometimes b2_x branches) and submit their ports there.
>
>
> The actual lag between features being developed and then getting into
> the master branch could be rather long too.

I think every organization that uses Lustre has a model similar to
this. But I don't think this is uncommon for other subsystems. The
various OFED flavors come to mind (I think MOFED was mentioned
in another thread). Everything is ultimately rebased on the
community version, AFAIK.

> > So I think it's mostly a matter of convincing people to use an
> > upstream
> > client. I don't think people wanted to use the staging client because
> > it
> > didn't work well and wasn't stable. And vendors don't want to work on
> > something that no one uses. It the client is "good enough" and people
> > are confident it'll continue to be updated, I think they will use it.
> > The
> > staging client was neither of those things.
>
>
> I agree once you convince people (both users and developers) to use the
> upstream client things will move in this desirable direction, but right
> now I don't know how to convince them.
> on RHEL (and derivatives) front the time lag is huge in particular.

Strictly speaking, the proposal from Neil was to derive the client from
the upstream release. For example, say Lustre got merged in Linux 7.4.
To support RHEL 8, we'd copy the Linux 7.4 client and combine it with
some Lustre compatibility code to generate a working client on the
older RHEL kernel. This is exactly what AMDGPU is doing [1], based on
my research.

So in this case, everyone would eventually be running the upstream
client - since the clients for vendor kernel would be derived from it.

[1] https://github.com/geohot/amdgpu-dkms/tree/master

> > > It does not hep that there are what 3? 4? trees, not "dual-tree" by
> > > any
> > > stretch of imagination.
> > >
> > >
> > > There's DDN/whamcloud (that's really two trees), there's HPE, LLNL
> > > keeps their fork still I think (thought it's mostly backports?).
> > > There
> > > are likely others I am less exposed to.
> >
> > I think most non-community Lustre release are derived from the
> > community release and periodically rebased. I think AWS,
> > Whamcloud, LLNL, Microsoft would fall into that bucket. And I
> > doubt DDN and HPE significantly diverge from community Lustre. But
> > if someone is diverging significantly from community Lustre, I think
> > they are opting into a significant maintenance burden regardless of
> > what we do with lustre-release/master.
>
>
> Both DDN and HPE significantly diverge with new features and such.
> There's also a (now mostly dormant) Fujitsu "FEFS" fork that they got
> tired of maintaining and tried to fold back in, but could not. (also
> Cray's secure data appliance that seems to have met a similar fate:
> https://github.com/Cray/lustre-sda <https://github.com/Cray/lustre-sda> )
>
>
> Yes, maintenance burden consideration is always there of course, so
> there's some coordination nowadays (like reserving feature flags ahead
> of time and such), but it's not outside of realm of possibility that if
> what's perceived as "tip of the community tree" becomes inconvenient,
> it'll be dropped.
> In fact a similar thing happened to the staging lustre in the past I
> guess, only before it even became the perceived tip (for a variety of
> reasons).

Both DDN and HPE regularly contribute fixes/features back to the community
branch from their respective production branches. HPE seems to rebase
their branches fairly often on community Lustre [1]. You would have more
context if that's true for DDN - I couldn't find much online.

But Fujitsu and the SDA team in HPE were not contributing back as
much and eventually abandoned their forks. So based on those examples,
it seems most sustainable for organizations to contribute to the community
release. So I think the risk of contributions being lessened because Lustre
moves towards upstream is low, IMHO.

But I agree with your fundamental point - we can't make submitting patches
to community Lustre arduous.

[1] https://github.com/Cray/lustre

> > > Sure, only one of those trees is considered "community Lustre", but
> > > if
> > > it will detach too much from what majority of developers really
> > > runs
> > > and gets paid to do - the "community Lustre" contributions probably
> > > would diminish greatly, I am afraid.
> >
> > As long as the community Lustre development process is sane, I think
> > most organizations will opt to continue deriving their releases from
> > it and opt to continue contributing releases upstream. We just need
> > to make sure we get buy-in from the people contributing to Lustre.
>
>
> Well, there's another half of it, the kernel side. Previous run in with
> other kernel maintainers had left a bit of a sour taste in people's
> mouths.
> Of course they have their own reasons to dictate whatever they want to
> newcomers (And all coming patches), but on the other hand Lustre is a
> mature product that could not just drop everything and rewrite
> significant chunks of the code (several times at that) o better align
> with the ever changed demands (bcachefs I think was a highly paraded
> around example of that, and they could accommodate those often
> conflicting demands because not many deployments in the wild).
> I don't know how possible is it to overcome. Kernel maintainers don't
> really care about Lustre (and rightfully so, we are but a blip to them)
> and then we also have our own priorities.

LSF/MM could be a good opportunity to improve our
relationship with the upstream maintainers. :)

> And while for Lustre developers there's a benefit of "the adjusting to
> new interfaces comes for free", there's no benefit to the kernel
> maintainers, so they don't have much incentive.
> (and again we saw this in the previous attempt)
>
>
> And even imagine by some magic the actual inclusion and all the
> relevant rework happened. Now HPE or DDN wants to add a new feature,
> they implement it and then submit and a met with the usual "now rework
> it in these other ways" demands.
> Of course again from the kernel maintainers perspective this is
> entirely reasonable and it's not their problem the development process
> is wrong and backwards and instead of developing everything in the open
> on the public branch with input from all parties interested there's
> this closed development going on. But good luck convincing respective
> management of those companies to agree.

Backporting from production branches to the community release
already takes some work. Especially in the feature is based on an
older LTS. So I don't think porting to upstream Linux would be a huge
amount of extra work.

On the other hand, if Lustre was included in mainline properly rather
than in staging - I think we’d have more leverage to implement things
the way we want to. After all, the kernel maintainers don't really care
about Lustre. :)

Tim Day