[Lustre-devel] Query to understand the Lustre request/reply message

Vilobh Meshram vilobh.meshram at gmail.com
Tue Oct 12 23:07:06 PDT 2010


Amazing...Thanks Nicholas and Alexey for your time and detailed reply.

I will try out the new API to create new RPC as per the steps mentioned by
you for Lustre 2.0 (since I am using 1.8.1.1 right now) .

Thanks again.

Thanks,
Vilobh
*Graduate Research Associate*
*Department of Computer Science*
*The Ohio State University Columbus Ohio**
*

On Wed, Oct 13, 2010 at 1:42 AM, Nicolas Williams <
Nicolas.Williams at oracle.com> wrote:

> On Wed, Oct 13, 2010 at 12:35:13AM -0400, Vilobh Meshram wrote:
> > Thanks a lot Alexey for the reply.The information will be really useful.
> >
> > Since I am using 1.8.1.1 for my research project I will have to rely on
> the
> > old API.Since in the source tree prior to 2.0 we do not have a
> > mdt/mdt_handler.c and layout.c files will have to work with the low level
> > buffer management structures(ptlrpc_request,lustre_msg_v2,etc).Do you
> know a
> > place or a function which make use of the old API which I can use as a
> > reference to write the RPC for my task.
>
> The new API is _much_ easier to use than the old API.
>
> To add an RPC you must:
>
>  - decide what it looks like
>
>   Every PTLRPC has an opcode and one or more "buffers", with each
>   buffer containing a C struct, a string, whatever.  If a buffer
>   contains a C struct, then it has to be fixed sized.  The first buffer
>   is struct ptlrpc_body.
>
>   A single RPC opcode can denote multiple different layouts, depending
>   on contents of various buffers.  A single layout is called a
>   "layout".  See below.
>
>  - add any struct, enum, and other C types you need to lustre_idl.h
>
>   You must make sure to use the base types we use in lustre_idl.h, such
>   as __u64.
>
>  - create swabber functions for your data, if necessary
>
>  - add handlers for the new RPC to mdt_handler.c (for the MDS) or
>   ost_handler.c (for the OST), and so on
>
>   The handlers are responsible for knowing which buffers contain what,
>   and for swabbing them.  You have to make sure that you don't swab a
>   buffer more than once.
>
> The new API allows you define formats quite nicely, and it takes care of
> calling swabbers and ensuring that no buffer is swabbed more than once.
> The formats are defined in lustre/ptlrpc/layout.c and look like this:
>
> struct req_format RQF_MDS_SYNC =
>        DEFINE_REQ_FMT0("MDS_SYNC", mdt_body_capa, mdt_body_only);
> ...
> static const struct req_msg_field *mdt_body_capa[] = {
>        &RMF_PTLRPC_BODY,
>        &RMF_MDT_BODY,
>        &RMF_CAPA1
> };
> static const struct req_msg_field *mdt_body_only[] = {
>        &RMF_PTLRPC_BODY,
>        &RMF_MDT_BODY
> };
> ...
>
> An RPC consists of a request and reply, with their formats given in the
> DEFINE_REQ_FMT0() macro (there's other macros).  Each message format
> defines a layout of buffers or, as we call them now, "fields", and each
> field has a format definition as well, such as:
>
> struct req_msg_field RMF_PTLRPC_BODY =
>        DEFINE_MSGF("ptlrpc_body", 0,
>                    sizeof(struct ptlrpc_body), lustre_swab_ptlrpc_body,
> NULL);
>
> for a struct buffer.  Other types of RMFs are possible (e.g., strings);
> see layout.c.
>
> So an MDS_SYNC RPC consists of a three-field (buffer) request and
> two-field reply.  The request's fields are: PTLRPC_BODY, MDT_BODY, and
> CAPA1.  The reply's fields are: PTLRPC_BODY and MDT_BODY.  PTLRPC_BODY
> is a fixed-sized field containing a C structure, and that the swabber
> for this field is lustre_swab_ptlrpc_body().  And so on.
>
> If you look at Lustre 2.0's mdt_handler.c and ost_handler.c you'll find
> that one of the first things done is to initialize a "capsule", and that
> the expected message format of a request is decided based on its opcode.
> That is, the mapping of opcode to RQF is not given by some array, but
> decided as we go.  Indeed, the RQF of a capsule can be changed
> mid-stream, with some constraints.
>
> So, with the new API you:
>
>  - add C types to lustre_idl.h for on-the-wire data
>  - add any swabbers to lustre/ptlrpc/pack_generic.c (declare them in
>   lustre_idl.h)
>  - add RQFs and, possibly, RMFs to layout.c
>  - declare the RQFs/RMFs in lustre/include/lustre_req_layout.h
>
>  - on the server-side:
>    - Modify the relevant handler to add an arm to the existing switch
>      on the request's opcode, call req_capsule_set() to set the
>      capsule's format, then call a function that will use
>      req_capsule_*get*() to get at the fields (buffers) (both, request
>      and reply buffers) to read from (request) or write to (reply).
>
>  - on the client-side:
>    - You'll do something very similar, except that there's no handler
>      function -- the pattern is less consistent, so you'll have to read
>      mdc*.c and so on to get a flavor for this...  Typically you'll
>      allocate a request using ptlrpc_request_alloc_pack(), fill in its
>      fields (again, using req_capsule_client_get() and friends), then
>      you'll send it using, for example, ptlrpc_queue_wait().
>
>      Take a good look at mdc_request.c in 2.0 to get a better idea of
>      how to build client stubs for your new RPCs.
>
> I haven't described the wirecheck part -- I can do that later, once
> you've made enough progress.  (We have a wirecheck/wiretest program pair
> to check that only backwards interoperable changes are made to
> lustre_idl.h.)
>
> I hope that helps.  Yes, it'd be nice to have something closer to an
> actual IDL.  The RQF/RMF/wirecheck/wiretest stuff could be extended to:
>
>  - auto-generate swabbers from lustre_idl.h structs
>  - provide a default opcode->RQF mapping
>  - provide more static type safety (by having req_capsule_*get() be
>   macros that cast the buffer address to the right type)
>  - auto-generate simple request constructors (that take pointers to
>   values of an RQF's correct request field C types)
>
> Compared to the old thing, the new API is much closer to an IDL.  It's a
> good thing.  I strongly recommend that you use it,
>
> Nico
> --
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101013/a967e3dd/attachment.htm>


More information about the lustre-devel mailing list