[Lustre-discuss] lock callback timer expired, lock on destroyed export, locks stolen, busy with active RPCs, operation 400 on unconnected MDS

Oleg Drokin oleg.drokin at oracle.com
Mon May 3 11:30:09 PDT 2010


Hello!

On May 3, 2010, at 11:49 AM, Thomas Roth wrote:
> We found a user job submission script that probably caused all this by
> starting
> - several hundred (900) jobs simultaneously
> - all of them opening one and the same file for batch system errors and
> one and the same file for its output.

You probably should keep an eye on developments in bug 20373 which should
help to avert this kind of problems for the usecase you describe.
The existing "good" patch in there should help somewhat and the other patch
under development will help some more once it's completed.

> Still I'd like to learn more about "operation X on unconnected MDS", on
> the net I only found my own question from two years ago.

This means MDS got a request X from a client that it believes is no longer
connected to it (because the client was evicted, I guess).

Bye,
    Oleg



More information about the lustre-discuss mailing list