[Lustre-discuss] [robinhood-support] robinhood error messages

LEIBOVICI Thomas thomas.leibovici at cea.fr
Wed Nov 24 04:20:32 PST 2010


Hi Thomas,

We already stated this, basically after the filesystem was blocked for a 
while, or after an OSS had crashed.
If it is stuck for too long (default timeout is 1 hour), robinhood tries 
to cancel its operation on current directory and continues with the next 
one.
Maybe it didn't recover successfuly from this cancellation, and you 
receive those messages since that badly happened.

To avoid this problem, you can increase the timeout to a very high 
value, to make sure it is never reached (e.g. xxx days).
In that case, robinhood will remain stuck as long as its current 
operation in Lustre is blocked,
and it will resume the current operation as soon as Lustre is back.

You can change this timeout by setting the "scan_op_timeout" parameter 
in the "FS_Scan" section of config file.

Alternatively, you can also keep a reasonable timeout and make robinhood 
exit when the filesystem is not responding
by setting "exit_on_timeout = TRUE" in the same section of the config. 
So you can respawn robinhood daemon when everything is fixed.

Best regards,
Thomas LEIBOVICI
CEA/DAM

> A support request from lustre-discuss.
>
> ------------------------------------------------------------------------
>
> Sujet:
> [Lustre-discuss] robinhood error messages
> Expéditeur:
> Thomas Roth <t.roth at gsi.de>
> Date:
> Tue, 23 Nov 2010 20:20:33 +0100
> Destinataire:
> lustre-discuss at lists.lustre.org
>
> Destinataire:
> lustre-discuss at lists.lustre.org
>
>
> Hi all,
>
> we are running robinhood (v2.2.1) on our 1.8.4 cluster (basically to 
> find out where and who the big space consumers are - no purging).
>
> Robinhood sends me lots and lots of messages (~100/day) of the type
>
>  > ===== FS scan is blocked (/lustre) =====
>  > Date: 2010/11/23 20:05:22
>  > Program: robinhood (pid 4826)
>  > Host: lxb310
>  > Filesystem: /lustre
>  > A thread has been inactive for 3660 sec
>  > while scanning directory /lustre/....
>
> This seems to indicate some trouble accessing certain directories on the 
> node where robinhood is running. However, this is independent of the 
> node, and at the same time we neither see any issues / slowness/ 
> connectivity problems nor get any user complaints of the like.
>
> So I wonder whether anybody else is using robinhood and has seen similar 
> messages.
>
> Regards,
> Thomas
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>   
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> Tap into the largest installed PC base & get more eyes on your game by
> optimizing for Intel(R) Graphics Technology. Get started today with the
> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
> http://p.sf.net/sfu/intelisp-dev2dev
> ------------------------------------------------------------------------
>
> _______________________________________________
> robinhood-support mailing list
> robinhood-support at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/robinhood-support
>   




More information about the lustre-discuss mailing list