[Lustre-devel] Global generic database

Nikita Danilov Nikita.Danilov at Sun.COM
Fri Feb 15 12:40:24 PST 2008

Peter J Braam writes:
 > Hmm ... here are my thoughts.

Can we use our existing directory/lookup/read/write mechanism to
implement this database? That is, imagine, that clients somehow get
special fid (DB_FID), representing directory not visible through the
normal namespace (this can be implemented as a /DB directory on the MDS
local file-system, alongside the /ROOT directory). Typical use of that
would be something along the lines of

int db_value_get(const char *key, void *buf, size_t count)
        static struct dt_object *topdir = object_by_fid(DB_FID);

        fd = lookup(topdir, key);
        read(fd, buf, count);

db_value_get("filesets.FOO.policy", buf, BUFSIZE);
db_value_get("pools.BAR.width", &pool_width, sizeof pool_width);


Main advantage of this approach is of course that all code is already
here, moreover...

 > 1. The word scalable is missing below.

fixed through the standards means: CMD, placement policies, split
directories, pdirops-locking,

 > 2. Any database that relates to file system policies and file system 
 > objects (HSM?) should be a separate mechanism coupled to the file 
 > system, so that you can pick up the server disks and the policies.

achieved automatically (if I understand the issue correctly),

 > 3. I think all updates to the database should be made on the server, and 
 > the use cases should be restricted (e.g. this is for relatively small 
 > databases).
 > 4. Imho pools belong in the configuration log. 
 > 5. Fileset attributes belong with the file system (see 2) - either these 
 > are implemented as special directory files and/or EA's (does the design 
 > specify the purpose and items that need to be stored in databases?).


 > > Needs to be:
 > > 1. Fast. We need to cache database entries locally, which also means 

hopefully fast. :-) Caching is already here,

 > > having them under locks.
 > >     a. local caching

already here,

 > >     b. locks

already here,

 > > 2. Generic.  Store any kind of data, not limited to 8k page boundaries, etc.

already here,

 > > 3. Transactional.  Power loss doesn't lead to inconsistent state.

already here,

 > > 4. Recoverable. Client changes are replayed if need be.

already here,

 > > 5. Remotely accessible, from a client or other servers.

already here.

Plus, we can allow clients to mount DB_FID as a separate file system, so
that usual tools can be used to maintain the database.


More information about the lustre-devel mailing list