[Lustre-discuss] [bug?] mdc_enter_request() problems

Oleg Drokin green at whamcloud.com
Mon Aug 8 11:20:52 PDT 2011


Hello!

   I guess this is some sort of 1.8 due to the init_waitq_head call.
   2.1 code is notably different in this case after LU-234 landed, namely removing
   mcw_entry from the list on error.
   The patch originates from bug 18213 and claimed as 1.8 port to 2.1, but I don't see anything like this in the 1.8 patch.

Bye,
    Oleg
On Aug 8, 2011, at 2:07 PM, Andreas Dilger wrote:

> On 2011-08-08, at 10:03 AM, chas williams - CONTRACTOR wrote:
>> we have seen a few crashes that look like:
>> 
>> [250696.381575] RIP: 0010:[<ffffffffa0a1f9e4>]  [<ffffffffa0a1f9e4>] mdc_exit_request+0x74/0xb0 [mdc]
>> ...
>> [250696.381575] Call Trace:
>> [250696.381575]  [<ffffffffa0a25042>] mdc_intent_getattr_async_interpret+0x82/0x500 [mdc]
>> [250696.381575]  [<ffffffffa089efd0>] ptlrpc_check_set+0x200/0x1690 [ptlrpc]
>> [250696.381575]  [<ffffffffa08d3140>] ptlrpcd_check+0x110/0x250 [ptlrpc]
>> 
>> and i sort of gather the problem arises from mdc_enter_request().
>> it allocates an mdc_cache_waiter on the stack and inserts it into the
>> wait list and then returns.
>> 
>> 	int mdc_enter_request(struct client_obd *cli)
>> 	...
>> 		struct mdc_cache_waiter mcw;
>> 	...
>> 			list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
>> 			init_waitqueue_head(&mcw.mcw_waitq);
>> 
>> later mdc_exit_request() finds this mcw by iterating the list.
>> seeing as mcw was allocated on the stack, i dont think you can do this.
>> mcw might have been reused by the time mdc_exit_request() gets around
>> to removing it.
> 
> What version of Lustre is this?
> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 

--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.




More information about the lustre-discuss mailing list