[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming
Day, Timothy
timday at amazon.com
Sat Jan 18 09:51:54 PST 2025
> On 1/17/25, 7:46 PM, "NeilBrown" <neilb at suse.de <mailto:neilb at suse.de>> wrote:
> > On Fri, 17 Jan 2025, Day, Timothy wrote:
> > The following is a draft topic for the upcoming LSF/MM conference.
> >
> > I wanted to solicit feedback from the wider Lustre development
> >
> > community before submitting this to fsdevel. If I’ve omitted anything,
> >
> > something doesn’t seem right, or you know of something that strengthens
> >
> > the argument, please let me know!
> >
> >
> >
> > ----------------------------------------------------
> >
> >
> >
> > Lustre is a high-performance parallel filesystem used for HPC and AI/ML
> >
> > compute clusters available under GPLv2. Lustre has achieved widespread
> >
> > adoption in the HPC and AI/ML and is commercially supported by numerous
> >
> > vendors and cloud service providers [1].
> >
> >
> >
> > After 21 years and an ill-fated stint in staging, Lustre is still
> > maintained as
> >
> > an out-of-tree module [6]. The previous upstreaming effort suffered
> > from a
> >
> > lack of developer focus and user adoption, which eventually led to
> > Lustre
> >
> > being removed from staging altogether [2].
> >
> >
> >
> > However, the work to improve Lustre has not stopped. In the intervening
> >
> > years, the code improvements that would preempt a return to mainline
> >
> > have been steadily progressing. At least 25% of patches accepted for
> >
> > Lustre 2.16 were related to the upstreaming effort [3]. And all of the
> >
> > remaining work is in-flight [4][5]. Our eventual goal is to a get a
> > minimal
> >
> > TCP/IP-only Lustre client to an acceptable quality before submitting to
> >
> > mainline.
>
>
> "Go big, or go home"!!
>
>
> If our eventual goal is not "Get lustre, both client and server, into
> mainline linux with support for TCP/IP and infiniband transports (at
> least)"
> then we really shouldn't bother.
>
>
> There is no formal, or even semi-formal, specification of the Lustre
> protocol. The lustre protocol is "what the code does" so it cannot work
> to develop client and server separately like it can for, e.g., NFS.
>
>
> The goal you describe is an interim goal. A first step (from the
> upstream community perspective).
Getting everything upstream is definitely the goal. The near term goal
is much smaller, of course - getting anything at all Lustre upstream. I've
even wondered at times if we could start with only LNET - standalone LNET
is pretty manageable and can be used as a standalone LNET router. So it can
be used for something besides out-of-tree Lustre. But I'm skeptical upstream
would be in favor of that approach, since the primary users would be out-of-tree
Lustre regardless.
Like Andreas said in another thread, I think the Lustre protocol is fairly stable.
So we wouldn't have too much trouble maintaining an independent client
in mainline. Although, ideally the server would follow afterwards.
On the other hand, I wonder if we upstream the whole thing all at once. Beside
the code being a bit nicer, the client isn't really that much closer to being upstream
than the server is. And no one else can test the client without having a Lustre
server on-hand. So no-one can easily run xfstests or similar. And doing everything
all at once would preempt questions of client/server split or the server upstreaming
timeline. But upstreaming so much all at once is probably more unrealistic.
> > I propose to discuss:
> >
> >
> >
> > - Expectations for a new filesystem to be accepted to mainline
> >
> > - Weaknesses in the previous upstreaming effort in staging
> >
>
>
> I think we know at least one perspective on the weaknesses in the
> previous upstreaming effort and we need to demonstrate that we will do
> better.
>
>
> https://lore.kernel.org/all/20180601091133.GA27521@kroah.com <mailto:20180601091133.GA27521 at kroah.com>/
>
>
> There is a whole separate out-of-tree copy of this codebase where the
> developers work on it, and then random changes are thrown over the
> wall at staging at some later point in time. This dual-tree
> development model has never worked, and the state of this codebase is
> proof of that.
>
>
> We need to demonstrate a process for, and commitment to, moving away
> from the dual-tree model. We need patches to those parts of Lustre
> that are upstream to land in upstream first (mostly).
>
>
> That means we need the model for supporting older kernels to be completely
> based on libcfs holding compatibility code with no kernel-version
> #ifdefs in the code.
>
>
> We need a strong separation between server and client so that we can
> justify everything that goes upstream as being to support the client,
> and when we add server support to that, it just adds files. Possibly we
> could patch a few files to add server support, but we need to maintain
> those as patches, not as alternate versions of upstream files.
>
>
> We need to quickly reach a point where a lustre release is:
>
>
> - a verbatim copy of relevant files from a chosen upstream release,
> or just a dependency on that kernel source.
> - a bunch of extra files that might one day go upstream: server code
> and LNet protocol code
> - a *few* patches to integrate that code
> - some number of patches which have since gone upstream - bugfixes etc.
> - libcfs which contains a compat layer for older kernels.
> - user-space code, documentation, test scripts, etc for which there
> is no expectation of upstreaming to linux kernel.
>
>
> Maybe the question for LSF is : what is a sufficient demonstration of commitment?
>
>
> The big question for us is : how are we going to transition our
> infrastructure to this model?
>
>
> It would be nice to have a timeline for getting the second and third
> bullet points down to zero. Obviously it would be aspirational at best,
> but a list of steps could be useful.
I agree that the development model needs to adapt - otherwise, we'd have to
soft-fork whatever code goes upstream. Keeping the two trees in-sync while
also doing feature development is unworkable.
The tricky part is: how do we support most Lustre developers current
workflows? Most developers and vendors only care about having a
functional client for older distro kernels. And all developers submit
patches via Whamcloud/DDN Gerrit and CI/CD. So everyone aligns their
workflows to whatever that system enforces (assuming it isn't too arduous).
Your proposed model (as I understand) is to use the upstream client as
a built dependency of the complete Lustre package? I think that could be
workable. But whatever we do, we need to find a way to move to that
development model before anything lands upstream. I think that would
be enough to demonstrate commitment, IMHO.
I wonder how AMDGPU does this? AMDGPU is significantly more complex
than Lustre and it's supported on older kernels via DKMS. I'll have to look
into this.
> Thanks,
> NeilBrown
>
>
> > Lustre has already received a plethora of feedback in the past. While
> > much
> > of that has been addressed since - the kernel is a moving target.
> > Several
> > filesystems have been merged (and removed) since Lustre left staging.
> > We're
> > aiming to avoid the mistakes of the past and hope to address as many
> > concerns as possible before submitting for inclusion.
> >
> >
> Thanks!
> >
> >
> > Timothy Day (Amazon Web Services - AWS)
> > James Simmons (Oak Ridge National Labs - ORNL)
> >
> >
> > [1] Lustre Community Update: https://youtu.be/BE--ySVQb2M?si= <https://youtu.be/BE--ySVQb2M?si=>
> > YMHitJfcE4ASWQcE&t=960
> > [2] Kicked out of staging: https://lwn.net/Articles/756565/ <https://lwn.net/Articles/756565/>
> > [3] ORNL, Aeon, SuSe, AWS, and more: https://youtu.be/BE--ySVQb2M?si= <https://youtu.be/BE--ySVQb2M?si=>
> > YMHitJfcE4ASWQcE&t=960
> > [4] LUG24 Upstreaming Update: https://www.depts.ttu.edu/hpcc/events/ <https://www.depts.ttu.edu/hpcc/events/>
> > LUG24/slides/Day1/LUG_2024_Talk_02-Native_Linux_client_status.pdf
> > [5] Lustre Jira Upstream Progress: https://jira.whamcloud.com/browse/ <https://jira.whamcloud.com/browse/>
> > LU-12511
> > [6] Out-of-tree codebase: https://git.whamcloud.com/?p=fs/ <https://git.whamcloud.com/?p=fs/>
> > lustre-release.git;a=tree
Tim Day
More information about the lustre-devel
mailing list