[Lustre-discuss] [robinhood-support] robinhood error messages

Thomas Roth t.roth at gsi.de
Wed Nov 24 05:00:30 PST 2010


Thank you Thomas.
If these messages mean that robinhood just continues after the timeout, 
it would be nothing to worry about, but I will try to adapt the timeout 
anyhow.
Right now, however, it seems the scan is really stuck: since days, 
rbh-report -i tells me about 612 TB in the filesystem, but lfs df says 
we have 787 TB ;-)

Btw, whenever I restart the scan, e.g. after a reconfiguration such as 
for the timeout, I get the logfile full of
 > ListMgr | DB query failed in ListMgr_Insert line 340...
and assorted messages, which seem to indicate that the new robinhood 
scan tries to put something into the DB that is already there, and 
stumbles on this. Or maybe that happens when several robins are running 
simultaneously. I'm not sure if it is a problem for the scan, it is, 
however, a problem for the free space on /var, or wherever I point the 
log to ;-)

Regards,
Thomas

On 24.11.2010 13:20, LEIBOVICI Thomas wrote:
> Hi Thomas,
>
> We already stated this, basically after the filesystem was blocked for a
> while, or after an OSS had crashed.
> If it is stuck for too long (default timeout is 1 hour), robinhood tries
> to cancel its operation on current directory and continues with the next
> one.
> Maybe it didn't recover successfuly from this cancellation, and you
> receive those messages since that badly happened.
>
> To avoid this problem, you can increase the timeout to a very high
> value, to make sure it is never reached (e.g. xxx days).
> In that case, robinhood will remain stuck as long as its current
> operation in Lustre is blocked,
> and it will resume the current operation as soon as Lustre is back.
>
> You can change this timeout by setting the "scan_op_timeout" parameter
> in the "FS_Scan" section of config file.
>
> Alternatively, you can also keep a reasonable timeout and make robinhood
> exit when the filesystem is not responding
> by setting "exit_on_timeout = TRUE" in the same section of the config.
> So you can respawn robinhood daemon when everything is fixed.
>
> Best regards,
> Thomas LEIBOVICI
> CEA/DAM
>
>  > A support request from lustre-discuss.
>  >
>  > ------------------------------------------------------------------------
>  >
>  > Sujet:
>  > [Lustre-discuss] robinhood error messages
>  > Expéditeur:
>  > Thomas Roth <t.roth at gsi.de>
>  > Date:
>  > Tue, 23 Nov 2010 20:20:33 +0100
>  > Destinataire:
>  > lustre-discuss at lists.lustre.org
>  >
>  > Destinataire:
>  > lustre-discuss at lists.lustre.org
>  >
>  >
>  > Hi all,
>  >
>  > we are running robinhood (v2.2.1) on our 1.8.4 cluster (basically to
>  > find out where and who the big space consumers are - no purging).
>  >
>  > Robinhood sends me lots and lots of messages (~100/day) of the type
>  >
>  > > ===== FS scan is blocked (/lustre) =====
>  > > Date: 2010/11/23 20:05:22
>  > > Program: robinhood (pid 4826)
>  > > Host: lxb310
>  > > Filesystem: /lustre
>  > > A thread has been inactive for 3660 sec
>  > > while scanning directory /lustre/....
>  >
>  > This seems to indicate some trouble accessing certain directories on the
>  > node where robinhood is running. However, this is independent of the
>  > node, and at the same time we neither see any issues / slowness/
>  > connectivity problems nor get any user complaints of the like.
>  >
>  > So I wonder whether anybody else is using robinhood and has seen similar
>  > messages.
>  >
>  > Regards,
>  > Thomas
>  > _______________________________________________
>  > Lustre-discuss mailing list
>  > Lustre-discuss at lists.lustre.org
>  > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>  >
>  >
>  > ------------------------------------------------------------------------
>  >
>  >
> ------------------------------------------------------------------------------
>  > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>  > Tap into the largest installed PC base & get more eyes on your game by
>  > optimizing for Intel(R) Graphics Technology. Get started today with the
>  > Intel(R) Software Partner Program. Five $500 cash prizes are up for
> grabs.
>  > http://p.sf.net/sfu/intelisp-dev2dev
>  > ------------------------------------------------------------------------
>  >
>  > _______________________________________________
>  > robinhood-support mailing list
>  > robinhood-support at lists.sourceforge.net
>  > https://lists.sourceforge.net/lists/listinfo/robinhood-support
>  >
>


-- 
--------------------------------------------------------------------
Thomas Roth           IT-HPC-Linux
Location: SB3 1.262   Phone: +49-6159-71 1453


http://twitter.com/gsi_it



More information about the lustre-discuss mailing list