[lustre-devel] [PATCH RFC 00/28] lustre: PFL port to linux client

James Simmons jsimmons at infradead.org
Wed Dec 26 17:53:26 PST 2018


> On Tue, Dec 18 2018, NeilBrown wrote:
> 
> > On Mon, Dec 17 2018, James Simmons wrote:
> >
> >> This is the initial PFL port to the linux lustre client. This opens
> >> up feed back on the port so far. Currently sanity passes but the
> >> test for sanity-pfl fail as below. I have been tracking downing
> >> various bugs but this one remains and I haven't found out why its
> >> failing. So far from what I can tell is lov_io_setattr_iter_init()
> >> it returning -ENODATA due to lsm_entry_inited() is not initialized.
> >
> > Having that invariant in cl_io_iter_fini() seems strange.
> > It is guaranteed to fir eif cl_io_iter_init() fails - if that is not
> > permitted, I would expect an invariant a lot closer to the failure.
> >
> > What happens if you just remove the LINVRNT() ??
> 
> I dug through the code some more, and I'm sure that LINVRNT() is wrong.
> 
> The cl_io_iter() call is meant to fail early, before ci_state gets to
> CIS_LOCKED, let alone CIS_UNLOCKED.  It sets ->ci_need_write_intent when
> it records the failure.  The code is then meant to fall through to
> the cl_io_fini() call in cl_setattr_ost(), which calls into vvp_io_fini)_
> which notices ->ci_need_write_intent, and calls ll_layout_write_intent(),
> which presumably initializes the things that weren't initialized before.
> This also sets ->ci_need_restart = 1 so that cl_setattr_ost() loops
> around to "again:" and calls cl_io_init() again.
> 
> So the invariant in cl_io_iter_fini() should probably be
> 
> 	LINVRNT(io->ci_state == CIS_INIT || io->ci_state == CIS_UNLOCKED);
> 
> or something like that.  Maybe needs CIS_IT_ENDED as well.
> 
> 	LINVRNT(io->ci_state <= CIS_INIT || io->ci_state >= CIS_UNLOCKED);
> 
> ??

You are right. I spent two weeks thinking I did the port wrong :-( I used
the second version which worked and saw only sanity-pfl test 11 failing.
I opened a ticket on this issue : 

https://jira.whamcloud.com/browse/LU-11828

and have pushed a patch for Bobi Jam to look at. We should have something
worked out soon. So PFL mostly worked outside of that. I will combine this
fix with a bunch others. I tracked down the majority of the causes of the
failures seen in the sanity testing.


More information about the lustre-devel mailing list