[lustre-devel] modern precreate
nrutman at gmail.com
Fri Jan 8 11:44:26 PST 2021
Riffing on something Andreas said in a lustre-discuss thread, I'm
hoping someone can correct my understanding of how precreate works
MDS would ask each OST for a set of precreated objects via a MDT->OST
RPC. These have to be cleaned up during recovery, hence a cap. These
were used up as MDS assigned them to layouts, and so MDS has to go
back and get more, even for 0-length files.
Modern days, Lustre 2.5+:
MDT doesn't hold a pool of OST objects but instead takes an OST fid
range from a FLD server instead. Each MD object has a mapping with an
eventual OST object by this fid. The OST side just holds a small
number of anonymous objects and assigns the fid to an object when any
operation is executed without an existing FID->inode mapping on the
OST.There is no more precreate RPC necessary, since OSTs maintain
their own pool of anonymous objects and only use them up when data is
actually written, and can create more when running low. There is no
recovery cleanup needed on the OSTs.
In this case, there should be no performance difference between create
and mknod except for the FLD operation, and the number of OSTs should
not matter for create rates.
Is my understanding wrong? It clearly must be, since Andreas is still
talking OST_CREATE rpc and recovery implications, and we do see a
performance difference with mknod and creating files with layouts.
[lustre-discuss] Improving file create performance with larger create_count)
The max_create_count is between 32 and 20000 (for protocol recovery
reasons, since unused precreated objects are destroyed during
recovery, and we put a cap on how many objects could be destroyed to
avoid badness in case of a bug) so this is already at the maximum.
You should be able to increase the create_count to 20000 as well.
However, this value is "auto tuned" based on how long it takes the OSS
to create the requested objects. If the OST_CREATE RPC takes too long
then the MDS will ask for fewer objects next time.
> * Is there a theoretical down side to pre-creating more objects? (MDS or OSS memory usage? Longer mount times? slower e2fsck?)
> A bit slower e2fsck, but compared to the total filesystem size this is minor. The biggest issue is that the old precreated objects will be destroyed during MDS-OSS recovery and new ones created.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-devel