[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming
Day, Timothy
timday at amazon.com
Fri Jan 24 07:53:30 PST 2025
I've created a wiki version of the outline below: https://wiki.lustre.org/Upstream_contributing. I'm
trying to consolidate all material related to the upstreaming effort on that page. If you know of anything,
feel free to add it to that page - or respond to this thread and I can add it.
Tim Day
On 1/22/25, 1:35 AM, "Day, Timothy" <timday at amazon.com <mailto:timday at amazon.com>> wrote:
I've created a second draft of the topic for LSF/MM. I tried
to include everyone's feedback. It's at the end of the email.
Before that, I wanted to elaborate on Neil's idea about updating
our development model to an upstream-focused model. For upstreaming
to work, the normal development flow has to generate patches to mainline
Linux - while still supporting the distro kernels that most people use
to run Lustre. I think we can get this point in stages. I've provided
a high-level overview in the next section. This won't be without
challenges - but the majority of the transition could happen without
interrupting feature work or normal development.
[I] Separate the kernel code, compatibility code, and userspace code
We should reorganize the Lustre tree to have a clear separation
of concerns:
fs/lustre/
net/lnet/
net/libcfs/
lustre_compat/
tests/
utils/
The functional components of libcfs/ would stay in that directory
and the compatibility components would live in lustre_compat/.
Centralizing the compatibility code makes it easier to maintain and
update and allows us to start removing the compatibility code from
the modules themselves. lustre_compat/ could still be compiled into
libcfs.ko, if we want to avoid creating even more modules.
[II] Get fs/ and net/ to compile on a mainline kernel
Once the compatibility code is isolated, we must get fs/ and net/
to compile on a mainline kernel - without any configuration or
lustre_compat/ layer.
We would validate this by adding build validation to each patch
submitted to Gerrit. The kernel version would be pinned (similar
to how we pin ZFS version) and we'd periodically update it and fix
any new build failures.
Once this is achieved, we'll have a native Linux client/server
that can be run on older distros via a compatibility layer.
[III] Move fs/ and net/ to a separate kernel tree
Transition to maintaining fs/ and net/ as a series on patches
on top of a mainline kernel release. At this point, we'll generating
patches to mainline Linux while retaining the ability to support
older distro kernels via lustre_compat/. Similar to the previous
step, we periodically rebase our Lustre patch series - fixing
lustre_compat/ as needed.
This is the only step that requires a change the Lustre development
workflow - patches would have to be split and sent to two
different repos. We can delay this step until we have some
confidence that Lustre has a path to be accepted to mainline.
[IV] Submit the patch series for inclusion
Once we are comfortable with the above process, we can submit the
initial patches to add Lustre support to the kernel. Our normal
development flow will generate a batch of patches to be submitted
during each merge window. After the merge window, we can focus
on testing and making sure that our backport to older distro
kernels it still working.
FAQ:
Q: Who will actually run the Lustre code in mainline Linux?
A: Everyone. Releases for older distros will be a combination
of the upstream Lustre combined with lustre_compat/ and
whatever stuff the kernel won't allow (like GPUDirect).
Q: What does a Lustre release look like?
A: We can generate a tarball by combining an upstream Lustre
release from mainline along with lustre_compat/ and the
userspace stuff. Vendors and third-parties can base
their versions of Lustre on those tarballs. Every time a
new kernel releases - a new Lustre release tarball will
be created. LTS releases can center around the LTS kernel
releases.
Q: How will we validate that fs/ and net/ build on mainline?
A: It would probably be easiest to create a minimalist mainline
kernel build in Jenkins. This would allow us to reuse most
of the existing lbuild scripting. The build would be
non-enforced at first. Testing would remain on distro
kernels, since most people use those.
Q: Will you create a wiki project tracking page for upstreaming
Lustre?
A: Yes
Q: Does anyone else have a similar model? Does this even work?
A: AMD GPU seems to have a similar approach, at least [1]. I'm
looking to get more feedback of LSF. We should talk to other
developers working in a model similar to this.
This is still a high level sketch, but I think this is a feasible
path to upstreaming Lustre. We need to define a clear roadmap
with tangible milestones to have a hope of upstreaming working.
But it's important that we don't disrupt developers established
workflows. We don't want to complicate contributing to Lustre
and we don't want to discourage people from contributing their
changes upstream.
Please give me any feedback or criticisms on this proposal. If we
think this is workable, I'm going to create a wiki project page for
this and attach it to the LSF/MM email.
[1] AMD GPU DKMS: https://github.com/geohot/amdgpu-dkms <https://github.com/geohot/amdgpu-dkms>
--------------------------------------------------------------------------------
Lustre is a high-performance parallel filesystem used for HPC
and AI/ML compute clusters available under GPLv2. Lustre is
currently used by 65% of the Top-500 (9 of Top-10) systems in
HPC [7]. Outside of HPC, Lustre is used by many of the largest
AI/ML clusters in the world, and is commercially supported by
numerous vendors and cloud service providers [1].
After 21 years and an ill-fated stint in staging, Lustre is still
maintained as an out-of-tree module [6]. The previous upstreaming
effort suffered from a lack of developer focus and user adoption,
which eventually led to Lustre being removed from staging
altogether [2].
However, the work to improve Lustre has continued regardless. In
the intervening years, the code improvements that previously
prevented a return to mainline have been steadily progressing. At
least 25% of patches accepted for Lustre 2.16 were related to the
upstreaming effort [3]. And all of the remaining work is
in-flight [4][5]. Our eventual goal is to a get both the Lustre
client and server (on ext4) along with at least TCP/IP networking to
an acceptable quality before submitting to mainline. The remaining
network support would follow soon afterwards.
I propose to discuss:
- As we alter our development model to support upstream development,
what is a sufficient demonstration of commitment that our model works? [8]
- Should the client and server be submitted together? Or split?
- Expectations for a new filesystem to be accepted to mainline
- How to manage inclusion of a large code base (the client alone is
200kLoC) without increasing the burden on fs/net maintainers
Lustre has already received a plethora of feedback in the past.
While much of that has been addressed since - the kernel is a
moving target. Several filesystems have been merged (or removed)
since Lustre left staging. We're aiming to avoid the mistakes of
the past and hope to address as many concerns as possible before
submitting for inclusion.
Thanks!
Timothy Day (Amazon Web Services - AWS)
James Simmons (Oak Ridge National Labs - ORNL)
[1] Wikipedia: https://en.wikipedia.org/wiki/Lustre_ <https://en.wikipedia.org/wiki/Lustre_>(file_system)#Commercial_technical_support
[2] Kicked out of staging: https://lwn.net/Articles/756565/ <https://lwn.net/Articles/756565/>
[3] This is a heuristic, based on the combined commit counts of
ORNL, Aeon, SuSe, and AWS - which have been primarily working
on upstreaming issues: https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960 <https://youtu.be/BE--ySVQb2M?si=YMHitJfcE4ASWQcE&t=960>
[4] LUG24 Upstreaming Update: https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf <https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf>
[5] Lustre Jira Upstream Progress: TODO
[6] Out-of-tree codebase: https://git.whamcloud.com/?p=fs/lustre-release.git;a=tree <https://git.whamcloud.com/?p=fs/lustre-release.git;a=tree>
[7] I couldn't find a link to this? TODO
[8] Include a link to a project wiki: TODO
More information about the lustre-devel
mailing list