[Lustre-discuss] inode weirdness
Stuart Midgley
sdm900 at gmail.com
Fri Sep 4 08:52:49 PDT 2009
I'm sorry Oleg, but I suspect I will never be able to run this test.
* I don't have a reproducer. At the time I had this problem, I
started about 200 jobs simultaneously and about 50 failed with this
problem. I reran those jobs and they worked just fine.
* I will never get a chance to make the FS quiet. We have way to much
production work on.
If I do get time to fiddle about and reproduce this problem I'll
create a bug.
--
Dr Stuart Midgley
sdm900 at gmail.com
On 04/09/2009, at 11:46 PM, Oleg Drokin wrote:
> Hello!
>
> On Sep 4, 2009, at 11:31 AM, Stuart Midgley wrote:
>
>> The file was created on the same node it was access from.
>
> Hm, interesting.
>
>> The error isn't permanent. When the job crashed, I went and
>> started investigating and the file was fine.
>
> I think I remember a bug like this that shadow(@sun.com) worked on.
> Turned out it is bug 17545 which has somewhat different symptoms,
> though.
>
>> No, the file is never unlinked.
>> How do I go about getting a lustre log?
>
> Make the system (mds-wise) as idle as possible (ideally only this
> node with problems should do anything
> on lustre).
> on mds and a client do a cat /proc/sys/lnet/debug and remember the
> value
> echo -1 >/proc/sys/lnet/debug on both mds and the client.
> lctl dk >/dev/null
> run your reproducer and immediatelly after error happens do
> lctl dk >/tmp/lustre.log on both mds and client nodes.
> then restore /proc/sys/lnet/debug values on the nodes back
> to what they were.
>
> Thanks.
>
> Bye,
> Oleg
More information about the lustre-discuss
mailing list