[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Mon Feb 3 09:33:34 PST 2025

> Porting to upstream doesn't work. The motivation isn't strong enough
> and people leave it then forget it and you get too much divergence and
> it become harder so people do it even less. People have tried. People
> have failed.
>
> Backporting from upstream to an older kernel isn't that hard. I do a
> lot of it and with the right tools it is mostly easy. One of the
> biggest difficulties is when we try to backport only a selection of
> patches because we might miss an important dependency. Sometimes it is
> worth it to avoid churn, sometime it is best to apply everything
> relevant. I assume that for the selection of kernels that whamcloud (or
> whoever) want to support, they would backport everything that could
> apply. I think that would be largely mechanical.
>
> Maybe it would be good for me to paint a more details picture of what I
> imagine would happen - assuming we do take the path of landing all of
> lustre, both client and server, upstream.
>
> - we would change the kernel code in lustre-release so that it was
> exactly what we plan to submit upstream.
> - we submit it and once accepted we have identical code in upstream
> linux and lustre-release
> - we fork lustre-release to a new package called (e.g.) lustre-tools and
> remove all kernel code leaving just utils and documentation and
> test code. The kinode.c kernel module that is in lustre/tests/kernel/
> would need to go upstream with the rest of the kernel code I think.
> lustre-tools would be easily accessible and buildable by anyone who
> wants to test lustre
> - we fork lustre-release to another new package lustre-backports
> and remove all non-kernel code from there. We configure it to build
> out-of-tree modules with names like "backport-lustre" "backport-lnet"
> and provide modprobe.conf files that alias the standard names to
> these. That should allow to over-ride the distributed modules (if
> any) when people choose to use backports.
> - upstream commits which touch lustre or lnet are automatically add to
> lustre-backports and someone is notified to help when they don't apply
>
> With this:
> Anyone who wants to test or use the lustre included with a particular
> kernel can do with with only the lustre-tools package. Anyone who
> wants to use the latest lustre code with an older kernel can build and
> use lustre-backports.
>
> There are probably rough-edges with this but I suspect they can be filed
> down.

I found an interesting data point. VAST seems to use an upstream NFS
client from an LTS kernel [1]. They have a compat layer to run that client
on older kernels. That's essentially what Lustre would be doing. They
also support Mellanox/GDS with this client. You can see exactly how
they did it by downloading the tarball. Even a large change like folio
didn’t seem to have a huge impact on the code. Just a little bit of
#ifdef'ing.

So an approach like this is feasible.

[1] https://vastnfs.vastdata.com/docs/4.0/download.html