[Lustre-discuss] [wc-discuss] System Deadlock

Andreas Dilger adilger at whamcloud.com
Thu Aug 18 10:09:30 PDT 2011


On 2011-08-18, at 7:50 AM, "Roger Spellman" <Roger.Spellman at terascala.com> wrote:

> Andreas, 
> Thanks for you reply.  It was very helpful.  See my responses, below.
> 
>>> I am in the process of porting Lustre client 1.8.4 to a recent
> kernel,
>> 2.6.38.8.
>> 
>> That is somewhat an unfortunate starting point, since 1.8.6 clients at
>> least work with 2.6.32 kernels.
> 
> I understand.  I started this project before 1.8.6 came out, and I
> wanted to stick with 1.8.4, in case any problems came up with 1.8.6.  As
> soon as I am done with 1.8.4, I will port my patch to 1.8.6.  
> 
>> It's difficult to make any kind of assessment without knowing what
> changes
>> you have made to the client.  It would be useful if you would submit a
>> series of patches so that we can take a look at your patches.
> 
> My plan was to get it working, then post the patch to anyone who wanted
> it.  That should be pretty soon.  I'm assuming that other people are
> wanting to run Lustre with recent kernels.
> 
>> No, the Linux stack traces are terrible, they just print anything that
>> looks like the address of a kernel or module function.  That includes
>> function addresses that are passed as function parameters, such as
>> callback functions.  It must have hit an interrupt at one point, but I
>> think it is just random garbage on the stack.
> 
> Too bad.  I compiled the Kernel with Frame Pointers, so I hoped that the
> kernel could unwind the stack properly.  

That helps, but AFAIK it isn't 100% correct even then. 

> Now that I know to ignore the Stack Trace, I can instrument the code to
> track down this problem. 

I don't think you need to ignore the stack, just treat it with caution and look for a valid callpath through the listed functions. 

Cheers, Andreas


More information about the lustre-discuss mailing list