[Lustre-discuss] inode weirdness

Stuart Midgley sdm900 at gmail.com
Fri Sep 4 08:52:49 PDT 2009


I'm sorry Oleg, but I suspect I will never be able to run this test.

* I don't have a reproducer.  At the time I had this problem, I  
started about 200 jobs simultaneously and about 50 failed with this  
problem.  I reran those jobs and they worked just fine.
* I will never get a chance to make the FS quiet.  We have way to much  
production work on.

If I do get time to fiddle about and reproduce this problem I'll  
create a bug.

-- 
Dr Stuart Midgley
sdm900 at gmail.com



On 04/09/2009, at 11:46 PM, Oleg Drokin wrote:

> Hello!
>
> On Sep 4, 2009, at 11:31 AM, Stuart Midgley wrote:
>
>> The file was created on the same node it was access from.
>
> Hm, interesting.
>
>> The error isn't permanent.  When the job crashed, I went and  
>> started investigating and the file was fine.
>
> I think I remember a bug like this that shadow(@sun.com) worked on.
> Turned out it is bug 17545 which has somewhat different symptoms,  
> though.
>
>> No, the file is never unlinked.
>> How do I go about getting a lustre log?
>
> Make the system (mds-wise) as idle as possible (ideally only this  
> node with problems should do anything
> on lustre).
> on mds and a client do a cat /proc/sys/lnet/debug and remember the  
> value
> echo -1 >/proc/sys/lnet/debug on both mds and the client.
> lctl dk >/dev/null
> run your reproducer and immediatelly after error happens do
> lctl dk >/tmp/lustre.log on both mds and client nodes.
> then restore /proc/sys/lnet/debug values on the nodes back
> to what they were.
>
> Thanks.
>
> Bye,
>    Oleg




More information about the lustre-discuss mailing list