[Lustre-discuss] [robinhood-support] robinhood error messages
Thomas Roth
t.roth at gsi.de
Wed Nov 24 05:00:30 PST 2010
Thank you Thomas.
If these messages mean that robinhood just continues after the timeout,
it would be nothing to worry about, but I will try to adapt the timeout
anyhow.
Right now, however, it seems the scan is really stuck: since days,
rbh-report -i tells me about 612 TB in the filesystem, but lfs df says
we have 787 TB ;-)
Btw, whenever I restart the scan, e.g. after a reconfiguration such as
for the timeout, I get the logfile full of
> ListMgr | DB query failed in ListMgr_Insert line 340...
and assorted messages, which seem to indicate that the new robinhood
scan tries to put something into the DB that is already there, and
stumbles on this. Or maybe that happens when several robins are running
simultaneously. I'm not sure if it is a problem for the scan, it is,
however, a problem for the free space on /var, or wherever I point the
log to ;-)
Regards,
Thomas
On 24.11.2010 13:20, LEIBOVICI Thomas wrote:
> Hi Thomas,
>
> We already stated this, basically after the filesystem was blocked for a
> while, or after an OSS had crashed.
> If it is stuck for too long (default timeout is 1 hour), robinhood tries
> to cancel its operation on current directory and continues with the next
> one.
> Maybe it didn't recover successfuly from this cancellation, and you
> receive those messages since that badly happened.
>
> To avoid this problem, you can increase the timeout to a very high
> value, to make sure it is never reached (e.g. xxx days).
> In that case, robinhood will remain stuck as long as its current
> operation in Lustre is blocked,
> and it will resume the current operation as soon as Lustre is back.
>
> You can change this timeout by setting the "scan_op_timeout" parameter
> in the "FS_Scan" section of config file.
>
> Alternatively, you can also keep a reasonable timeout and make robinhood
> exit when the filesystem is not responding
> by setting "exit_on_timeout = TRUE" in the same section of the config.
> So you can respawn robinhood daemon when everything is fixed.
>
> Best regards,
> Thomas LEIBOVICI
> CEA/DAM
>
> > A support request from lustre-discuss.
> >
> > ------------------------------------------------------------------------
> >
> > Sujet:
> > [Lustre-discuss] robinhood error messages
> > Expéditeur:
> > Thomas Roth <t.roth at gsi.de>
> > Date:
> > Tue, 23 Nov 2010 20:20:33 +0100
> > Destinataire:
> > lustre-discuss at lists.lustre.org
> >
> > Destinataire:
> > lustre-discuss at lists.lustre.org
> >
> >
> > Hi all,
> >
> > we are running robinhood (v2.2.1) on our 1.8.4 cluster (basically to
> > find out where and who the big space consumers are - no purging).
> >
> > Robinhood sends me lots and lots of messages (~100/day) of the type
> >
> > > ===== FS scan is blocked (/lustre) =====
> > > Date: 2010/11/23 20:05:22
> > > Program: robinhood (pid 4826)
> > > Host: lxb310
> > > Filesystem: /lustre
> > > A thread has been inactive for 3660 sec
> > > while scanning directory /lustre/....
> >
> > This seems to indicate some trouble accessing certain directories on the
> > node where robinhood is running. However, this is independent of the
> > node, and at the same time we neither see any issues / slowness/
> > connectivity problems nor get any user complaints of the like.
> >
> > So I wonder whether anybody else is using robinhood and has seen similar
> > messages.
> >
> > Regards,
> > Thomas
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
> > ------------------------------------------------------------------------
> >
> >
> ------------------------------------------------------------------------------
> > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> > Tap into the largest installed PC base & get more eyes on your game by
> > optimizing for Intel(R) Graphics Technology. Get started today with the
> > Intel(R) Software Partner Program. Five $500 cash prizes are up for
> grabs.
> > http://p.sf.net/sfu/intelisp-dev2dev
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > robinhood-support mailing list
> > robinhood-support at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/robinhood-support
> >
>
--
--------------------------------------------------------------------
Thomas Roth IT-HPC-Linux
Location: SB3 1.262 Phone: +49-6159-71 1453
http://twitter.com/gsi_it
More information about the lustre-discuss
mailing list