[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming
Oleg Drokin
green at whamcloud.com
Fri Jan 17 19:16:53 PST 2025
On Sat, 2025-01-18 at 11:45 +1100, NeilBrown wrote:
> We need to demonstrate a process for, and commitment to, moving away
> from the dual-tree model. We need patches to those parts of Lustre
> that are upstream to land in upstream first (mostly).
I think this is not very realistic.
Large chunk (100%?) of users do not run not only the latest kernel
release, they don't run the latest LTS either.
When we were in staging last this manifested in random patches being
landed and breaking the client completely and nobody noticing for
months.
Of course some automatic infrastructure could be built up to make it
somewhat better, but it does not remove the problem of "nobody would
run this mainline tree", I am afraid.
It does not hep that there are what 3? 4? trees, not "dual-tree" by any
stretch of imagination.
There's DDN/whamcloud (that's really two trees), there's HPE, LLNL
keeps their fork still I think (thought it's mostly backports?). There
are likely others I am less exposed to.
Sure, only one of those trees is considered "community Lustre", but if
it will detach too much from what majority of developers really runs
and gets paid to do - the "community Lustre" contributions probably
would diminish greatly, I am afraid.
The past situation of "oh, this new enterprise linux comes with a
community lustre version, so the first step to get something usable is
to rip it entirely off and then apply the new good version" is not
exactly desirable either I am afraid.
And solving this problem is mostly outside of hands of individual
developers no matter how cool I think it would be to actually have an
up to date Lustre in the mainline linux kernel.
> That means we need the model for supporting older kernels to be
> completely
> based on libcfs holding compatibility code with no kernel-version
> #ifdefs in the code.
>
> We need a strong separation between server and client so that we can
> justify everything that goes upstream as being to support the client,
> and when we add server support to that, it just adds files. Possibly
> we
> could patch a few files to add server support, but we need to
> maintain
> those as patches, not as alternate versions of upstream files.
>
> We need to quickly reach a point where a lustre release is:
>
> - a verbatim copy of relevant files from a chosen upstream release,
> or just a dependency on that kernel source.
> - a bunch of extra files that might one day go upstream: server code
> and LNet protocol code
> - a *few* patches to integrate that code
> - some number of patches which have since gone upstream - bugfixes
> etc.
> - libcfs which contains a compat layer for older kernels.
> - user-space code, documentation, test scripts, etc for which there
> is no expectation of upstreaming to linux kernel.
All these sound like an awful lot of dedicated developer-hours.
> Maybe the question for LSF is : what is a sufficient demonstration of
> commitment?
>
> The big question for us is : how are we going to transition our
> infrastructure to this model?
and who would pay for it.
This in the end was the downfall of the previous attempt. There never
was any serious funding behind the effort so it became an afterthought
for most.
> It would be nice to have a timeline for getting the second and third
> bullet points down to zero. Obviously it would be aspirational at
> best,
> but a list of steps could be useful.
>
> Thanks,
> NeilBrown
>
Bye,
Oleg
More information about the lustre-devel
mailing list