[lustre-devel] RFC: Spill device for Lustre OSD

Oleg Drokin green at whamcloud.com
Tue Nov 4 15:45:35 PST 2025


On Tue, 2025-11-04 at 23:37 +0000, Day, Timothy wrote:
> > > If we can implement a no-transaction osd-vfs, that would expose a
> > > lot of flexibility for other reasons as well. Possibly the osd-
> > > vfs
> > > could
> > > implement a journal or other logging layer internally to make up
> > > for
> > > lack of transactions, whether initially or at a later stage?
> > 
> > That's actually an interesting idea, but probably not very
> > practical.
> > The journal itself is probably not really feasible with VFS api
> > alone
> > because you need to touch journal together with whatever it you are
> > modifying, unless you update the journal first and then everything
> > else, but that is likely going to be slow due to all the overhead?
> > That's probably why all the journaling filesystems hide the journal
> > inside themselves out of reach for the VFS api.
> > Of course VFS api could probably be extended eventually if there's
> > a
> > good justification, but who knows how long it'll take and how the
> > final
> > agreed upon implementation would actually look like.
> 
> Databases do this kind of journaling and they use normal filesystem
> APIs. And they can do this with a similar performance profile as a
> Lustre OSS or MDS. OSD is pretty much a database on top of a normal
> filesystem. So I think it's possible.

I think databases do it by pushing everything through the journal
pretty much?
I guess the alternative is the rename semantics where you write the new
something as a temp file and then rename, but again that's of limited
usefullness in our case?

> If we had a no-atomic-transaction osd-vfs, we could perhaps use it
> for stand-alone MGS. Before ending each write transaction, the osd-
> vfs
> could fsync() the whole filesystem. This isn't feasible for MDS or
> OSS, of
> course. But performance demands on MGS are low enough (I suspect)
> that this would work.

But why would we want a special osd type that's only suitable for mgs?




More information about the lustre-devel mailing list