[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Sat Jan 25 15:25:20 PST 2025

On Sat, 25 Jan 2025, Alexey Lyahkov wrote:
> 
> 
>     25 янв. 2025 г., в 02:24, NeilBrown <neilb at suse.de> написал(а):
>    
>     On Thu, 23 Jan 2025, Alexey Lyahkov wrote:
>    
>        
>        
>        
>            Keeping different kernels up to date with new updates is
>         something
>            that
>            the linux-stable team does all the time.  We do it at SUSE
>         to.  It
>            isn't
>            that hard.
>            You identify which patches *need* to be backported (ideally
>         when
>            the
>            patch is created but that isn't always easy) and you use
>         tools to
>            help
>            you backport them.
>        
>         So Lustre developers needs control all stable kernels and think
>         which
>         patch needs back ported and send it to Distro owner
>         And for each LTS kernels on the kernel.org.. I think it
>         increase a work
>         dramatically.
>    
>    
>     No.  Lustre developers don't need to care about the stable kernels
>     at
>     all.  The stable team does that an explicitly say they don't want
>     it to
>     be a burden on maintainers.
> 
> Lustre maintainers don’t needs review code which affects a Lustre? It’s
> something new for me.

It may be new, but it is still true - partly.
The stable team only apply patches which have already landed in
upstream, so they have already be reviewed by upstream maintainers.
They do sometimes apply other patches when a fix is needed but it cannot
be achieved with a simple backport - but those will only be accepted
from maintainers.

So the patches *have* been reviewed.  They've been reviewed in a different
context so the review might not still apply and certainly patches do
land in -stable which break things in all sorts of different ways.  It
is generally thought that this cost is small compared to the benefit
of getting lots of fixes.

> I understand a drivers changes should don’t reviewed by lustre team.
> But arch/… fs/ .. mm/... kernel/ needs attention.
> Lack to review will cause a very large quality degradation after short
> time.
> As I point early - I think none of linux maintainers have a lustre
> cluster to test a patches before land.

They don't today, partly because lustre is not mainline.  There are a
number of testing efforts around the kernel which run all sort of
different tests.  If we made it easy to spin up a virtual lustre cluster
for testing and publicised that, I think there is a reasonable chance
that some people will run it.  It doesn't need to be the upstream
maintainers.  It can be anyone with the relevant resources.

> They can do test own part and how to build at all. But how it affects a
> Lustre? A specially in performance area.
> Small examples from past.
> Small optimisation for page_accessed() and LRU lists fixes a problem
> with ext4 bitmap in memory and improve lustre performace for 10%. Due
> lack of read during write. (https://lwn.net/Articles/548830/)
> Small change in jbd2 code - like replace list_add to list_add_tail -
> improve performance for 5-15% due journal handle starvation solved.
> (https://www.spinics.net/lists/linux-ext4/msg84888.html)
> 
> So yes, Lustre developers can move  LTS kernels as unsupported area and
> if it’s broken just suggest to install an out-of-tree module with
> supported kernel. But did linux kernel needs a broken code in tree
> really?

If someone reports that an LTS kernel is broken they should be directed
to whoever supports it, not just told to rip out the code.
If they cannot find anyone to support it they could be guided to use a
different kernel that does have support.

> 
> 
> 
>     The lustre team *can* decide to have some involvement - adding
>     Fixes
>     tags, adding Cc: stable, even submitting backports which don't
>     apply
>     trivially.  But there is no requirement from anywhere.
>    
> 
>     The lustre community only need to focus on one upstream.
> 
> And have a broken lustre client once it don’t tested. Or lustre client
> will hit a performance degradation.

Yes, code that is not maintained will suffer regressions.  This is how
various vendors make money - by selling support and having expertise to
fix regressions.

> 
> 
> Neil, Tim, 
> 
> 
>     Lustre develops who work for employers who sell support for older
>     kernels might need to handle backports to those kernels and it is
>     in
>     everybody's interest not to make that unduly different e.g.  by
>     separating bug fixes from features etc.
>    
> 
> Lustre primary area is ‘older’ kernels. As I point early half of
> customers uses a RHEL7, second 30% is RHEL8.
> And just 2% uses a modern kernels.

RHEL's primary area is older kernels.  SLES's primary area is older kernels.
We still contribute primarily upstream.  
We know upstream is not suitable for our customers.  Before we choose a
kernel for a new release we run a lot of testing on a range of
candidates and pick the one that seems to have the least problems.  Then
we work it identify and fix the problems that are most likely to affect
our customers.

There is no reason that lustre vendors couldn't use and benefit from the
same model.  New work goes upstream, Backport the bits needed by your
customers to the kernels that your customers are using.

It isn't clear to me that the "Community edition" of lustre needs to
support older kernels at all (though I don't object to that).  Each
vendor can choose the kernel or kernels that they want to support and
select the relevant patches from upstream to make it fit their needs.
Bugfixes should be easy as they should be tagged as bug fixes.  Features
might be a little harder but not enormously so.

Thanks,
NeilBrown

> 
> 
> 
>     The lustre community may well choose to host and share those
>     backports,
>     and maybe even include them in testing.  But I suspect that would
>     be
>     driven by vendors who sell support.  It certainly wouldn't be
>     imposed by
>     the upstream community.
>    
>     Exactly how we work with distros like Redhat, SUSE, Ubuntu would
>     depend
>     on what can be negotiated with them.
>     Some might be willing to accept backports and release them in
>     maintenance updates.  Some might not.
>     In that case the way to support their kernel for your customers
>     would be
>     to start with the source for a particular maint update, add the
>     missing
>     patches, build, and distribute the result.
>     You probably would only need to do this for each servie-pack, not
>     for
>     each update.
>    
>     It isn't really different from what it done today, but it would be
>     done
>     in a different way.
>    
>     Thanks,
>     NeilBrown
> 
> 
> 
> 
>