[Lustre-devel] Query to understand the Lustre request/reply message

Nicolas Williams Nicolas.Williams at oracle.com
Wed Oct 13 00:43:46 PDT 2010


On Wed, Oct 13, 2010 at 10:27:55AM +0300, Alexey Lyashkov wrote:
> eh.. Nicolas,
> 
> Format for messages which want to reconstructed after resend and don't
> want recontructed - is different.
> 
> As quick example it is OPEN request (via MDS_REINT command), that type
> message need a have extra buffer to store LOV EA, which to be send to
> MDS in replay case (with additional flag in header).  (client have a
> copy data from a mds reply after ptlrpc finish processing request).
> That is why i say about "Reconstruct/replay case"

Sure, but this buffer needs to be declared a priori.  If you won't know
whether you'll need a buffer until later, that's OK: you declare it
anyways and you set its size to zero if you don't need it.

You can't change a capsule's format to add buffers; you can only set the
size of unnecessary buffers to zero.  This is because the header of a
ptlrpc (not the ptlrpc_body, mind you) has a count of buffers then a
variable length (64-bit aligned) set of that many 32-bit buffer lengths
(I'm going from memory here), and adding buffers can put a reply over
the expected max size on the client side, leading to it being dropped.

You can change a capsule's format to change the definition of a field
from one without a swabber to one with a swabber.

You'll see in many cases that the presence of a field (meaning, whether
it's checked for or whether it has a non-zero size) is dependent on a
flag in the mdt or ost body, as you mention.  Replays are not the only
interesting case here.  Capabilities are another.

Some of these flags could be removed and replaced instead with checks of
buffer size (0 -> flag not set, >0 -> flag set).

> Also format is different is you want to use MDS_REINT + sub commands
> or you want to use something similar to MDS_SET_INFO.  For
> MDS_SET_INFO you use single format for all messages (just simple key
> <> value) buffer, but for MDS_REINT you need two formats - one for
> generic MDS_REINT code (get opcode from command, get locks, and
> possible other) and own format for each opcode  - such as open,
> unlink, setxattr, setattr.  all of them have a different number of
> buffers (fields).

The SET_INFO RPCs are kinda gross.  I should know, since I finished the
conversion of ost_handler.c to the new API.  You can see that I used
req_capsule_extend() to handle some SET_INFO cases.  No, I didn't cover
this detail, nor others, because I figured Vilobh needed a starting
point, and that's all I was going to provide tonight.

Nico
-- 



More information about the lustre-devel mailing list