[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Sat Feb 1 14:19:03 PST 2025

On Sat, 25 Jan 2025, Oleg Drokin wrote:
> On Sat, 2025-01-25 at 10:12 +1100, NeilBrown wrote:
> > On Mon, 20 Jan 2025, Oleg Drokin wrote:
> > > On Sun, 2025-01-19 at 09:48 +1100, NeilBrown wrote:
> > > > 
> > > > Once the transition completes there will still be process
> > > > difficulties,
> > > > but there are plenty of of process difficulties now (gerrit: how
> > > > do I
> > > > hate thee, let me count the ways...) but people seem to simply
> > > > include
> > > > that in the cost of doing business.
> > > 
> > > it's been awhile since I did patch reviews by emails, but I think
> > > gerrit is much more user-friendly (if you have internet, anyway)
> > 
> > I guess it isn't exactly the gerrit interface but more the workflow
> > that
> > it encourages, or at least enables.
> > The current workflow seems to be "patch at a time" rather than
> > "patchset
> > at a time".
> > The fact that you cherry-pick patches into master is completely
> > different to how most (all?) of the upstream community works.  It
> > means
> > that whole series isn't visible in the final git tree so we lose
> > context.  And it seems to mean that a long series takes a loooooong
> > time
> > to land as it dribbles in to master.
> 
> In fact the whole series is visible in gerrit if you submit it as such.

True, but not as helpful as I might like.  I cannot see a way to add
someone as a reviewer for a whole series, and there is no way for them
to then give a +1 for the whole series.  These are trivial actions when
using unstructured email.

> But as you noted later the testing is unfortunately much les reliable
> than we want, and because we only land things in order they are
> submitted in the series - if you bunch unrelated stuff all together
> suddenly it might take much longer to land.

Certainly only land them in the order they appear in the series, but my
memory of my experience is that even when they are related they don't
all land in master at once.  This might be due to the constant fight
against false positives with testing, and the need to get people to
re-review every patch when a small update is needed in an earlier patch.
But it seems to be more than that.

> 
> > I would MUCH rather that a whole series was accepted or rejected as a
> > whole - and was merged rather than cherry-picked to keep commit ids
> > stable.
> 
> Gerrit has such a mode, but we decided it does not work well for us for
> a variety of reasons.

I wonder if it might be time to review those reasons, particularly as we
need to review a lot of process if we are to move toward working
upstream.
We don't *have* to follow what other subsystems do (and they aren't all
the same), but we would want to have clear reasons, understood by all,
for deviating significantly

> 
> > There are times when I would like the first few patches of a series
> > to
> > land earlier, but that should be up to the submitter to split the
> > series. 
> 
> But you cannot if these later patches do depend on the earlier ones?

Only because gerrit cannot cope.
Using email, I might submit a set of 10 patches, get some discussion,
decide that the first 5 really are good-to-go but the later ones need
some work. So I resubmit the first 5.  They can easily be applied by the
maintainer.
With gerrit I could "abandon" the latter 5 but I might not want to do
that - I might want to revise.  But probably abandoning would be ok.
Still not quite as easy as simply not resubmitting them.

> 
> > And the automatic testing is a real pane.  Certainly it is valuable
> > but
> > it has a real cost too.  The false positives are a major pane.  I
> > would
> > rather any test that wasn't reliable were disabled (of fixed) as a
> > priority.  Or at least made non-fatal.
> > Also, it would be much nicer if the last in a series were tested
> > first
> > and if that failed then don't wasted resources testing all the
> > others.
> > Bonus points for a "bisect" to find where the failure starts, but
> > that
> > can be up to the develop to explicitly request testing at some points
> > in
> > the series.
> 
> Yes,  I agree there's much to be improved testing-wise. I did not come
> up with my own parallel testing system because I was happy with the
> default one after all.
> And then my own system deteriorated (At least it does not set -1s left
> and right, though that means people totally ignore those results too)
> 

I think your parallel system that adds comments to gerrit is sometimes
very helpful.  What bothers me is that you have another test setup that
you run on master-next before and won't land things until that passes.
It means that I can pass all the auto testing and get all the reviews
and still there is a question mark over if/when it will land.  For me
that created a sense of helplessness that was quite demotivating.
Obviously it is good to find bugs, and those bugs need to be fixed, but
they don't need to prevent landing.
I would feel more motivated if there were a clear process that was
followed most of the time whereby once a series passed auto-testing and
had sufficient reviews it would be expected to land e.g.  the next
Monday.  Maybe it doesn't land in master, maybe in something else that
gets merged after a stabilisation window, but it should land.

If asynchronous testing then reports a problem that needs to be
addressed.  Worst-case the patch might be reverted but that would be
rare.  Often a small fix will make the problem go away.

Thanks,
NeilBrown