[Lustre-devel] Query to understand the Lustre request/reply message
Vilobh Meshram
vilobh.meshram at gmail.com
Fri Oct 15 09:25:02 PDT 2010
Hi Alexey,
I have attached the diff file .Please have a look at it and please let me
know your comments /suggestions.
Thanks again.
Thanks,
Vilobh
*Graduate Research Associate
Department of Computer Science
The Ohio State University Columbus Ohio*
On Fri, Oct 15, 2010 at 3:39 AM, Alexey Lyashkov <
alexey.lyashkov at clusterstor.com> wrote:
> can you please attach diff file ?
>
> On Oct 15, 2010, at 03:58, Vilobh Meshram wrote:
>
> Hi Alexey/Nicholas,
>
> I modified the code in following way in the way Nicholas suggested
> yesterday in-order to get some information filled in a fixed sized buffer
> sent from client side.Here I am sending a buffer called "str" (whose size is
> 16) which will be updated at the MDS side by the string "hello"(whose size
> is 7 much less than original size of buffer "str" i.e 16).But I am not able
> to perform the operation successfully and I am getting an error
> "LustreError: 4209:0:(file.c:3143:ll_inode_revalidate_fini()) failure -14
> inode 31257"
>
> which seems to be related to DLM_REPLY_REC_OFF since I have modified this
> offset in my code.Can you please review my code and suggest me if I am
> making any mistake.I will be done with my task if I can resolve this
> problem.
>
> Following are the modifications .The text in BOLD and Italics (blue color)
> are my modification at Client and MDS side for *Lustre 1.8.1.1*:-
>
> *At Client side :- lustre/ldlm/ldlm_lockd.c**
>
> * 655 int ldlm_cli_enqueue(.........)
> 665 __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
> 666 [DLM_LOCKREQ_OFF] = sizeof(*body),
> 667 [DLM_REPLY_REC_OFF] = lvb_len ? lvb_len :
> 668 sizeof(struct
> ost_lvb),
> * 669 16};*
>
> 717 if (reqp == NULL || *reqp == NULL) {
> *718 req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
> |
> |
> v
>
> 575 struct ptlrpc_request *ldlm_prep_elc_req(.......)
> 584 void *str=NULL;
> 585 char *bufs[4] = {NULL,NULL,NULL,str};
> 616 req =
> ptlrpc_prep_req(class_exp2cliimp(exp), version,
> 617 opc, bufcount,
> size, bufs**);
>
>
> At MDS side :- lustre/ldlm/ldlm_lockd.c
>
> 992 int ldlm_handle_enqueue(.........)
> 996 {
> 1000 void *str;
> __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
> ptlrpc_body),
> [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep)
> 1009 char *org = "hello";
>
>
> *1119 existing_lock:
> 1120
> 1121 if (flags & LDLM_FL_HAS_INTENT) {
> 1122 /* In this case, the reply buffer is allocated deep in
> 1123 * local_lock_enqueue by the policy function. */
> 1124 cookie = req;
> 1125 } else {
> *1126 int buffers = 4;*
> 1127
> 1128 lock_res_and_lock(lock);
> 1129 if (lock->l_resource->lr_lvb_len) {
> * size[DLM_REPLY_REC_OFF] =
> lock->l_resource->lr_lvb_len;
> buffers = 4;*
> 1132 }
> 1133 unlock_res_and_lock(lock);
> 1134
> 1135 if
> (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
> 1136 GOTO(out, rc = -ENOMEM);
> * str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1,
> 1);
> memcpy ( str , org , 7);
> size[DLM_REPLY_REC_OFF + 1] = 16;
>
>
> *
>
> Thanks,
> Vilobh
> *Graduate Research Associate
> Department of Computer Science
> The Ohio State University Columbus Ohio*
>
>
> On Thu, Oct 14, 2010 at 12:25 PM, Vilobh Meshram <vilobh.meshram at gmail.com
> > wrote:
>
>> Hi Alexey,
>>
>> That surely helps.Thanks for all the help till now.
>>
>> Thanks,
>> Vilobh
>> *Graduate Research Associate
>> Department of Computer Science
>> The Ohio State University Columbus Ohio*
>>
>>
>> On Thu, Oct 14, 2010 at 11:45 AM, Alexey Lyashkov <
>> alexey.lyashkov at clusterstor.com> wrote:
>>
>>> Hi Vilobh,
>>>
>>> interop == interoperability between nodes with different version of
>>> software.
>>>
>>> in general we have two ways to solve that - for requests with mdc_body -
>>> you can set flag in body and analyze that flag in server/client side.
>>> if you want add new operation - better way add new flag into
>>> connect_data (look to OBD_CONNECT_* macroses handling)
>>> that flag can checked via export->connect_flags on client or server side
>>> for remote side features.
>>> as example 1.x and 2.0 have a different format for setattr requests :
>>> int mdc_setattr
>>> ...
>>> if (mdc_exp_is_2_0_server(exp)) {
>>>
>>> size[REQ_REC_OFF] = sizeof(struct mdt_rec_setattr);
>>>
>>> size[REQ_REC_OFF + 1] = 0; /* capa */
>>>
>>> size[REQ_REC_OFF + 2] = 0; //sizeof (struct mdt_epoch);
>>>
>>> size[REQ_REC_OFF + 3] = ealen;
>>>
>>> size[REQ_REC_OFF + 4] = ea2len;
>>>
>>> size[REQ_REC_OFF + 5] = sizeof(struct ldlm_request);
>>>
>>> offset = REQ_REC_OFF + 5;
>>>
>>> bufcount = 6;
>>>
>>> replybufcount = 6;
>>>
>>> } else {
>>>
>>> bufcount = 4;
>>>
>>> }
>>>
>>>
>>> example of client features are checking version based recovery support
>>> for client
>>> mds_version_get_check
>>> ...
>>> if (inode == NULL || !exp_connect_vbr(req->rq_export))
>>>
>>>
>>>
>>> I hope that help you.
>>>
>>>
>>> On Oct 14, 2010, at 18:29, Vilobh Meshram wrote:
>>>
>>> Hi Alexey,
>>>
>>> Thanks again for the reply.
>>>
>>> Can you briefly give me some pointers about this interop issue and in
>>> which kind of RPC should this issue arise ? How should we resolve this what
>>> kind of flag needs to be set in ?
>>>
>>> I went through the bugzilla entry mentioned by you it seems like for RPCs
>>> dealing with LDLM may cause this issue.Please correct me if I am wrong.
>>>
>>> Thanks,
>>> Vilobh
>>> *Graduate Research Associate
>>> Department of Computer Science
>>> The Ohio State University Columbus Ohio*
>>>
>>>
>>> On Thu, Oct 14, 2010 at 11:10 AM, Alexey Lyashkov <
>>> alexey.lyashkov at clusterstor.com> wrote:
>>>
>>>> Hi Vilobh,
>>>>
>>>> as i see, you touched code related to locking. struct ldm_request used
>>>> to lock enqueue process - that why i say about interop issue in ELC code,
>>>> which solved with export flag.
>>>> for common mdc requests you can resolve interop issue with flags in
>>>> mdc_body (mdt_body), but that not possible for ldlm requests.
>>>>
>>>>
>>>> On Oct 14, 2010, at 18:04, Vilobh Meshram wrote:
>>>>
>>>> Hi Alexey,
>>>>
>>>> Thanks again for your reply.
>>>>
>>>> I am trying to embed a buffer in the RPC which will get filled in with
>>>> some values which MDS is aware of which the client calling the RPC is not
>>>> aware of.It has nothing to do with locking.I just want to fill in the
>>>> buffer which I embedd in the RPC with some suitable data from the MDS end
>>>> and then do operations on that data at the client side.So I think the
>>>> approach suggested by you and Nicholas of just including the sizeof(str)
>>>> [the size of the expected information from the MDS] in the size[] array
>>>> should be fine as done below :-
>>>>
>>>>
>>>>
>>>> __u32 size[2] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
>>>> [DLM_LOCKREQ_OFF] =
>>>> sizeof(struct ldlm_request) };
>>>>
>>>> ---->>
>>>> __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct
>>>> ptlrpc_body),
>>>> [DLM_LOCKREQ_OFF] = sizeof(struct
>>>> ldlm_request) ,
>>>> //how to add "char *str=Hello"
>>>> ofcourse we will have sizeof(str) but how to choose the MACRO like DLM_LOCKREQ_OFF
>>>> bcz for a specific kind of RPC there are limited number of such MACROS
>>>>
>>>>
>>>> *Please correct me if I am wrong or please guide me if I need to
>>>> consider few corner cases to handle this use case.
>>>>
>>>> *Thanks again.
>>>>
>>>> Thanks,
>>>> Vilobh
>>>> *Graduate Research Associate
>>>> Department of Computer Science
>>>> The Ohio State University Columbus Ohio*
>>>>
>>>>
>>>> On Thu, Oct 14, 2010 at 10:40 AM, Alexey Lyashkov <
>>>> alexey.lyashkov at clusterstor.com> wrote:
>>>>
>>>>> Andreas,
>>>>>
>>>>> On Oct 14, 2010, at 17:31, Andreas Dilger wrote:
>>>>>
>>>>> > On 2010-10-13, at 23:18, Nicolas Williams wrote:
>>>>> >> On Thu, Oct 14, 2010 at 06:38:16AM +0300, Alexey Lyashkov wrote:
>>>>> >>> On Oct 14, 2010, at 03:28, Nicolas Williams wrote:
>>>>> >>>> Yes, it's possible to add buffers to requests. It's not possible
>>>>> to add
>>>>> >>>> buffers to _replies_ to existing RPCs unless you know the client
>>>>> expects
>>>>> >>>> those additional buffers -- existing clients expect a given
>>>>> maxsize for
>>>>> >>>> each reply, and if your reply is bigger then it will get dropped.
>>>>> >>> It is wrong for last ~1year.
>>>>> >>> ~1year ago i add code to ptlrpc layer which a adjust buffer for
>>>>> reply, and resend a request.
>>>>> >>
>>>>> >> Ah, I didn't know that was in 1.8. Are there interop issues (with
>>>>> older
>>>>> >> clients) though with sending larger replies than expected?
>>>>> >
>>>>> > Nico, it has always been possible in the past to increase the size of
>>>>> any buffer in a request, or in a reply (if the total reply size will fit
>>>>> into the pre-allocated reply buffer). An older peer would just ignore the
>>>>> bytes beyond the known part of the buffer.
>>>>> >
>>>>> I think that question don't about rebalance buffers size in message,
>>>>> i think that sending large reply in smaller reply buffer.
>>>>> LNet don't able to put large reply to small buffer (without truncate
>>>>> flag, which is not exist in older ptlrpc version).
>>>>> without that flag you will see messages
>>>>> >>
>>>>> CERROR("Matching packet from %s, match "LPU64
>>>>> " length %d too big: %d left, %d allowed\n",
>>>>> libcfs_id2str(src), match_bits, rlength,
>>>>> md->md_length - offset, mlength);
>>>>> >>
>>>>> and LNet will drop message without notify PtlRPC.
>>>>>
>>>>>
>>>>> > Is that not true with the 2.x RPC handling?
>>>>> >
>>>>> 2.x able to rebalance space between buffers (but looks by hand), and
>>>>> able adjust reply buffer after truncated reply.
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------
>>>>> Alexey Lyashkov
>>>>> alexey.lyashkov at clusterstor.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20101015/3b409614/attachment.htm>
-------------- next part --------------
Index: lustre.spec
===================================================================
--- lustre.spec (revision 8279)
+++ lustre.spec (working copy)
@@ -1,7 +1,7 @@
# lustre.spec
%{!?version: %define version 1.8.1.1}
-%{!?kversion: %define kversion }
-%{!?release: %define release }
+%{!?kversion: %define kversion 2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust}
+%{!?release: %define release 2.6.18_128.7.1.el5_lustre.1.8.1.1smp_cust_201010141227}
%{!?lustre_name: %define lustre_name lustre}
%define is_client %(bash -c "if [[ %{lustre_name} = *-client ]]; then echo -n '1'; else echo -n '0'; fi")
@@ -104,7 +104,7 @@
# Set an explicit path to our Linux tree, if we can.
cd $RPM_BUILD_DIR/lustre-%{version}
-./configure '--disable-modules' '--disable-utils' '--disable-liblustre' '--disable-tests' '--disable-doc' --with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \
+./configure '--with-o2ib=/usr/local/ofed/src/ofa_kernel' '--with-linux=/lib/modules/2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust/build' --with-lustre-hack --with-sockets %{?configure_flags:configure_flags} \
--sysconfdir=%{_sysconfdir} \
--mandir=%{_mandir} \
--libdir=%{_libdir}
Index: lustre/mds/handler.c
===================================================================
--- lustre/mds/handler.c (revision 8279)
+++ lustre/mds/handler.c (working copy)
@@ -1687,7 +1687,7 @@
mds->mds_max_mdsize,
mds->mds_max_cookiesize };
int bufcount;
-
+ printk("Inside function %s a hit for case MDS_REINT",__func__);
/* NB only peek inside req now; mds_reint() will swab it */
if (opcp == NULL) {
CERROR ("Can't inspect opcode\n");
@@ -1704,15 +1704,18 @@
switch (opc) {
case REINT_CREATE:
+ printk("Inside function %s a hit for case REINT_CREATE",__func__);
op = PTLRPC_LAST_CNTR + MDS_REINT_CREATE;
break;
case REINT_LINK:
op = PTLRPC_LAST_CNTR + MDS_REINT_LINK;
break;
case REINT_OPEN:
+ printk("Inside function %s a hit for case REINT_OPEN",__func__);
op = PTLRPC_LAST_CNTR + MDS_REINT_OPEN;
break;
case REINT_SETATTR:
+ printk("Inside function %s a hit for case REINT_SETATTR",__func__);
op = PTLRPC_LAST_CNTR + MDS_REINT_SETATTR;
break;
case REINT_RENAME:
@@ -1745,8 +1748,9 @@
if (opc == REINT_UNLINK || opc == REINT_RENAME)
size[DLM_REPLY_REC_OFF + 1] = 0;
}
-
+ printk("Inside function %s in case MDS_REINT before calling lustre_pack_reply",__func__);
rc = lustre_pack_reply(req, bufcount, size, NULL);
+ printk("Inside function %s in case MDS_REINT after calling lustre_pack_reply",__func__);
if (rc)
break;
@@ -1756,6 +1760,7 @@
}
case MDS_CLOSE:
+ printk("Inside function %s in case MDS_CLOSE",__func__);
DEBUG_REQ(D_INODE, req, "close");
OBD_FAIL_RETURN(OBD_FAIL_MDS_CLOSE_NET, 0);
rc = mds_close(req, REQ_REC_OFF);
@@ -1798,6 +1803,7 @@
break;
#endif
case OBD_PING:
+ printk("Inside function %s got a hit at case OBD_PING",__func__);
DEBUG_REQ(D_INODE, req, "ping");
rc = target_handle_ping(req);
if (req->rq_export->exp_delayed)
@@ -1811,6 +1817,7 @@
break;
case LDLM_ENQUEUE:
+ printk("\n Inside function %s got a hit at case LDLM_ENQUEUE",__func__);
DEBUG_REQ(D_INODE, req, "enqueue");
OBD_FAIL_RETURN(OBD_FAIL_LDLM_ENQUEUE, 0);
rc = ldlm_handle_enqueue(req, ldlm_server_completion_ast,
Index: lustre/ldlm/ldlm_request.c
===================================================================
--- lustre/ldlm/ldlm_request.c (revision 8279)
+++ lustre/ldlm/ldlm_request.c (working copy)
@@ -581,6 +581,8 @@
int flags, avail, to_free, pack = 0;
struct ldlm_request *dlm = NULL;
struct ptlrpc_request *req;
+ void *str=NULL;
+ char *bufs[4] = {NULL,NULL,NULL,str};
CFS_LIST_HEAD(head);
ENTRY;
@@ -609,8 +611,10 @@
size[bufoff] = ldlm_request_bufsize(pack, opc);
}
+ printk("\n Inside function %s before calling ptlrpc_prep_req",__func__);
+ printk("\n OPC for LDLM_ENQUEUE is %d",opc);
req = ptlrpc_prep_req(class_exp2cliimp(exp), version,
- opc, bufcount, size, NULL);
+ opc, bufcount, size, bufs);
req->rq_export = class_export_get(exp);
if (exp_connect_cancelset(exp) && req) {
if (canceloff) {
@@ -658,10 +662,11 @@
struct ldlm_lock *lock;
struct ldlm_request *body;
struct ldlm_reply *reply;
- __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
+ __u32 size[] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
[DLM_LOCKREQ_OFF] = sizeof(*body),
[DLM_REPLY_REC_OFF] = lvb_len ? lvb_len :
- sizeof(struct ost_lvb) };
+ sizeof(struct ost_lvb),
+ 16};
int is_replay = *flags & LDLM_FL_REPLAY;
int req_passed_in = 1, rc, err;
struct ptlrpc_request *req;
@@ -710,7 +715,7 @@
/* lock not sent to server yet */
if (reqp == NULL || *reqp == NULL) {
- req = ldlm_prep_enqueue_req(exp, 2, size, NULL, 0);
+ req = ldlm_prep_enqueue_req(exp, 4, size, NULL, 0);
if (req == NULL) {
failed_lock_cleanup(ns, lock, lockh, einfo->ei_mode);
LDLM_LOCK_PUT(lock);
Index: lustre/ldlm/ldlm_lockd.c
===================================================================
--- lustre/ldlm/ldlm_lockd.c (revision 8279)
+++ lustre/ldlm/ldlm_lockd.c (working copy)
@@ -997,13 +997,17 @@
struct obd_device *obddev = req->rq_export->exp_obd;
struct ldlm_reply *dlm_rep;
struct ldlm_request *dlm_req;
- __u32 size[3] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
- [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep) };
+ void *str;
+ __u32 size[4] = { [MSG_PTLRPC_BODY_OFF] = sizeof(struct ptlrpc_body),
+ [DLM_LOCKREPLY_OFF] = sizeof(*dlm_rep)
+ };
int rc = 0;
__u32 flags;
ldlm_error_t err = ELDLM_OK;
struct ldlm_lock *lock = NULL;
void *cookie = NULL;
+ char *org = "hello";
+
ENTRY;
LDLM_DEBUG_NOLOCK("server-side enqueue handler START");
@@ -1119,19 +1123,24 @@
* local_lock_enqueue by the policy function. */
cookie = req;
} else {
- int buffers = 2;
+ int buffers = 4;
lock_res_and_lock(lock);
if (lock->l_resource->lr_lvb_len) {
size[DLM_REPLY_REC_OFF] = lock->l_resource->lr_lvb_len;
- buffers = 3;
+ buffers = 4;
}
unlock_res_and_lock(lock);
if (OBD_FAIL_CHECK_ONCE(OBD_FAIL_LDLM_ENQUEUE_EXTENT_ERR))
GOTO(out, rc = -ENOMEM);
+ str = lustre_msg_buf(req->rq_reqmsg, DLM_REPLY_REC_OFF+1, 1);
+ memcpy ( str , org , 7);
+ size[DLM_REPLY_REC_OFF + 1] = 16;
+ printk("\n Inside function %s before calling 1.LUSTRE_PACK_REPLY",__func__);
rc = lustre_pack_reply(req, buffers, size, NULL);
+ printk("\n Inside function %s after calling 1.LUSTRE_PACK_REPLY",__func__);
if (rc)
GOTO(out, rc);
}
@@ -1215,7 +1224,9 @@
out:
req->rq_status = rc ?: err; /* return either error - bug 11190 */
if (!req->rq_packed_final) {
+ printk("\n Inside function %s before calling 2.LUSTRE_PACK_REPLY",__func__);
err = lustre_pack_reply(req, 1, NULL, NULL);
+ printk("\n Inside function %s after calling 2.LUSTRE_PACK_REPLY",__func__);
if (rc == 0)
rc = err;
}
More information about the lustre-devel
mailing list