[Lustre-discuss] [bug?] mdc_enter_request() problems

chas williams - CONTRACTOR chas at cmf.nrl.navy.mil
Mon Aug 8 09:03:25 PDT 2011


we have seen a few crashes that look like:

[250696.381575] RIP: 0010:[<ffffffffa0a1f9e4>]  [<ffffffffa0a1f9e4>] mdc_exit_request+0x74/0xb0 [mdc]
...
[250696.381575] Call Trace:
[250696.381575]  [<ffffffffa0a25042>] mdc_intent_getattr_async_interpret+0x82/0x500 [mdc]
[250696.381575]  [<ffffffffa089efd0>] ptlrpc_check_set+0x200/0x1690 [ptlrpc]
[250696.381575]  [<ffffffffa08d3140>] ptlrpcd_check+0x110/0x250 [ptlrpc]

and i sort of gather the problem arises from mdc_enter_request().
it allocates an mdc_cache_waiter on the stack and inserts it into the
wait list and then returns.

	int mdc_enter_request(struct client_obd *cli)
	...
		struct mdc_cache_waiter mcw;
	...
			list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
			init_waitqueue_head(&mcw.mcw_waitq);

later mdc_exit_request() finds this mcw by iterating the list.
seeing as mcw was allocated on the stack, i dont think you can do this.
mcw might have been reused by the time mdc_exit_request() gets around
to removing it.

	void mdc_exit_request(struct client_obd *cli)
	...
			mcw = list_entry(l, struct mdc_cache_waiter, mcw_entry);



More information about the lustre-discuss mailing list