[Lustre-discuss] [wc-discuss] System Deadlock
Andreas Dilger
adilger at whamcloud.com
Thu Aug 18 10:09:30 PDT 2011
On 2011-08-18, at 7:50 AM, "Roger Spellman" <Roger.Spellman at terascala.com> wrote:
> Andreas,
> Thanks for you reply. It was very helpful. See my responses, below.
>
>>> I am in the process of porting Lustre client 1.8.4 to a recent
> kernel,
>> 2.6.38.8.
>>
>> That is somewhat an unfortunate starting point, since 1.8.6 clients at
>> least work with 2.6.32 kernels.
>
> I understand. I started this project before 1.8.6 came out, and I
> wanted to stick with 1.8.4, in case any problems came up with 1.8.6. As
> soon as I am done with 1.8.4, I will port my patch to 1.8.6.
>
>> It's difficult to make any kind of assessment without knowing what
> changes
>> you have made to the client. It would be useful if you would submit a
>> series of patches so that we can take a look at your patches.
>
> My plan was to get it working, then post the patch to anyone who wanted
> it. That should be pretty soon. I'm assuming that other people are
> wanting to run Lustre with recent kernels.
>
>> No, the Linux stack traces are terrible, they just print anything that
>> looks like the address of a kernel or module function. That includes
>> function addresses that are passed as function parameters, such as
>> callback functions. It must have hit an interrupt at one point, but I
>> think it is just random garbage on the stack.
>
> Too bad. I compiled the Kernel with Frame Pointers, so I hoped that the
> kernel could unwind the stack properly.
That helps, but AFAIK it isn't 100% correct even then.
> Now that I know to ignore the Stack Trace, I can instrument the code to
> track down this problem.
I don't think you need to ignore the stack, just treat it with caution and look for a valid callpath through the listed functions.
Cheers, Andreas
More information about the lustre-discuss
mailing list