[Lustre-devel] Erratum about indexes in robinhood DB

Tue Oct 11 11:42:41 PDT 2011

HI,

yes, there is a fixed patch which adds some info to the changelog,
UID, GID, NID. it is not a problem to pack it with other (changed)
inode attributes. not sent upstream because an issue has been 
found which fix has not been landed yet.

On Oct 11, 2011, at 9:12 PM, Nathan Rutman wrote:

> We actually already did some of that for a one-off. We didn't push the changes upstream because
> there were some ugly layering violations involved.  Vitaly, do you remember the details?
> 
> 
> On Oct 11, 2011, at 6:04 AM, Eric Barton wrote:
> 
>> Thomas,
>> 
>> Interesting point about changelog entries requiring a 'stat'.
>> 
>> Nathan, what's your take on making changelogs tell you what has
>> changed - even if only on "easy" changes?
>> 
>>        Cheers,
>>                 Eric
>> 
>>> -----Original Message-----
>>> From: LEIBOVICI Thomas [mailto:thomas.leibovici at cea.fr]
>>> Sent: 06 October 2011 12:02 PM
>>> To: Eric Barton
>>> Cc: lustre-devel at lists.lustre.org
>>> Subject: Re: Erratum about indexes in robinhood DB
>>> 
>>> Hello Eric,
>>> 
>>> With a fast enough feeder, the ingest rate robinhood can currently
>>> sustain is between 50.000/sec and 100.000/sec
>>> (depending on insert/update/remove ratio) with a basic MySQL DB stored
>>> on a local disk.
>>> This can certainly still be improved with MySQL tunings and/or better HW
>>> and/or enterprise class DB,
>>> but for now, we notice it is easily high enough for reading a MDT
>>> changelog stream on a Petaflopic system.
>>> 
>>> This rate is actually lower when processing Lustre MDT changelogs (but I
>>> have no measurement) because of "stat" operations to get file attributes
>>> (unfortunately, changelogs do not give the new value of what has just
>>> changed, e.g new uid for a chown operation, new size&mtime with a mtime
>>> event...)
>>> SOM will probably improve that point, but it could be a good idea to add
>>> more info in changelogs.
>>> 
>>> Handling chglogs from multiple MDTs is indeed a very interesting point
>>> to address.
>>> The main issue is the database scaling in terms of operation rate,
>>> volume and entry location.
>>> A solution could be using an existing clustered DB engine (MySQL
>>> cluster, NOSQL DBs...),
>>> thus we are going to take a look at the different alternatives and see
>>> if they could match the need.
>>> For that, it would be interesting to know how records will be splitted
>>> into the multiple changelog streams:
>>> is a given fid always reported by the same stream? what about the parent
>>> fid (like in create/unlink operations)?
>>> If you have a document about DNE design, I think it would give a more
>>> precise idea about
>>> what event and fid is supposed to be reported by each MDT.
>>> 
>>> Thanks,
>>> Thomas
>>> 
>>> Eric Barton wrote:
>>>> Thomas,
>>>> 
>>>> Thanks a lot and I hope you don't mind me cc-ing lustre-devel as this
>>>> seems to be of general interest.
>>>> 
>>>> Do you have a feel (or measurements :) for the rate at which a changelog
>>>> can be ingested into robinhood?  And I'm wondering about DNE and multiple
>>>> changelogs coming from multiple MDTs.  I'd be very interested to know if
>>>> you've thought about this and have views on what the maximum ingest rate
>>>> could be and whether there will be issues coordinating/merging events
>>>> across multiple feeds.
>>>> 
>>>>        Cheers,
>>>>                 Eric
>>>> 
>>>> Eric Barton
>>>> CTO Whamcloud, Inc.
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: LEIBOVICI Thomas [mailto:thomas.leibovici at cea.fr]
>>>>> Sent: 29 September 2011 10:04 AM
>>>>> To: Eric Barton
>>>>> Subject: Erratum about indexes in robinhood DB
>>>>> 
>>>>> Hello Eric,
>>>>> 
>>>>> Re-thinking about your question on indexes in robinhood DB, my answer
>>>>> was incomplete.
>>>>> Actually, there are indexes on user/group/type/status, but there are not
>>>>> on the main table:
>>>>> 
>>>>> 1) As I said you, on the main table (the one that list all FS entries),
>>>>> there are as few indexes as possible (just fid as primary key, and
>>>>> parent fid)
>>>>> in order to preserve a good insert/update rate on this table whatever
>>>>> the FS size (the deeper the DB index trees, the slower those requests).
>>>>> 
>>>>> 2) There is a secondary table where robinhood maintains aggregated
>>>>> statitics like nbr entries, volume per user/group/type/(hsm)status and
>>>>> which is updated on the fly.
>>>>> This one as indexes on quite all its fields, which makes it possible to
>>>>> get instantaneous stats per user, etc. without penalizing insert/update
>>>>> rate on main table.
>>>>> Indexes on this secondary table are less expensive, given that the set
>>>>> of users is much more resticted that the nbr of entries.
>>>>> 
>>>>> This time you have a more complete answer.
>>>>> 
>>>>> Best regards
>>>>> Thomas
>>>>> 
>>>> 
>>>> 
>> 

--
Vitaly