[Lustre-discuss] [robinhood-support] robinhood error messages

LEIBOVICI Thomas thomas.leibovici at cea.fr
Thu Nov 25 01:07:58 PST 2010


Hello Thomas,

Sorry, I just saw the email you sent on robinhood-support mailing list 
and that was blocked waiting for admin validation.
About multiple robinhood instances, the documentation says that you can 
split the features on different nodes:
basically, the database server can run on a machine, FS scan on another 
machine, disk resource monitoring and purging on another machine, etc...
But you must only run a single instance of each feature at a given time.

Thomas Roth wrote:
>> Is there a way to "partition" a file system for Robinhood? Tell an
>> instance to only scan certain directories? Because I think the issue is
>> not a really broken data base, but simply a later coming Robin scanning
>> files that were already done?
What is your need exactly? Do you want to speed-up the scan by running 
several robinhood instances,
or do you only want to scan certain directories?
- About speed, robinhood already performs scans in parallel with 
multiple threads, each one scanning different directories.
So if you want more parallelism, increase the number of scan threads.
- If your need is to scan only some parts of the namespace, you can 
ignore directories by specifying "ignore" rules in the configuration 
file (FS_Scan section)
E.g. ignore { path == "/lustre/xyz*" } if you know the path you want to 
ignore, or a negation:
ignore { not ( path == "/lustre/dir1" or path == "/lustre/dir2/subdir*" 
) } if you know the paths you want to scan.

>>  > > ListMgr | DB query failed in ListMgr_Insert line 340...
>>  > and assorted messages, which seem to indicate that the new robinhood
>>  > scan tries to put something into the DB that is already there, and
>>  > stumbles on this. Or maybe that happens when several robins are
>>  > running simultaneously.
>> Are you running several instances for scanning the same filesystem??
>
> Well, yes, tried that also. Actually I was under the impression that 
> this is a feature of Robinhood - of course, now that I am looking for 
> this in the documentation I can't find it.
>
> But these errors from the DB definitely did arise first when I 
> restarted robinhood anew after some changes (location of log file, 
> debug level, ...) in the config file. But since there was no change in 
> the robinhood version, I did not empty the database. After this 
> restart, I immediately got a lot of
> > 2010/11/04 11:27:45 robinhood[1489/4]: EntryProc | Error 3 
> performing database operation.
> > 2010/11/04 11:27:45 robinhood[1489/8]: ListMgr | DB query failed in 
> ListMgr_Insert line 340: pk='54051386:6D286C', code=3: Duplicate entry 
> '54051386:6D286C' for key 1
>
> I suppose this is something that should not happen when one is feeding 
> a database?
Yes, these errors seams to be caused by the concurrence between several 
feeders. This is not sane, and the db content may be inconsistent now.
So I recommend you to stop all your running instances, clear the db 
content (command "rbh-config empty_db")
and then, only start a single instance for scanning.

Best regards,
Thomas.




More information about the lustre-discuss mailing list