[Lustre-devel] Oleg an Eric - Supporting >512 OSTs for Striping
Oleg Drokin
Oleg.Drokin at Sun.COM
Thu Feb 5 15:18:39 PST 2009
Hello!
Adding Lustre-devel to CC.
On Feb 5, 2009, at 5:31 PM, Andreas Dilger wrote:
> it is probably worthwhile to do a code audit to see if there are
> many/any
> "for each stripe" kind of operations that could be avoided for such
> widely
> striped files. Common operations like lov_merge_lvb() and
> lov_adjust_kms()
> will become very expensive, and could possibly be optimized in some
> cases.
I suspect there are enough of them.
When I worked on slow small i/o, I noticed that we do merge_lvb pretty
often
needlessly, for example. Basically, on partial page update, on
refresh_ap
(sending write rpc - for every page), for every ll_readahead call (which
means for every page read). Every time we do glimpse.
Every time after enqueueing extent lock (even if cached).
On every read syscall.
I had a plan on how to fix it that turned out to be more complicated
than I thought.
And in the end it was not the main culprit at the time.
Basically what we need to do is to store up-to-date merged lvb in
inode somewhere
and update it with after every enqueue or lock cancel.
This is only relevant to b1_x codebase, I see that in HEAD with new io
rewrite code,
the number of calls to merge_lvb is dramatically lower (only for
glimpses), though
potentially some cpu could be saved by only merging after changes
actually occurred.
> Similarly, we might consider to do MDS-originated object destroys for
> such files (or all files) instead of sending huge RPC with cookies to
> the client (~84kB reply). These could be batched on unlink commit,
> and
> would also avoid the "inodes with destroyed objects" bug previously
> discussed.
Do you only think of this as a way to cut the maximum RPC reply size
on MDS?
Bye,
Oleg
More information about the lustre-devel
mailing list