[lustre-devel] RFC: Spill device for Lustre OSD
Oleg Drokin
green at whamcloud.com
Mon Nov 3 16:15:32 PST 2025
On Mon, 2025-11-03 at 16:04 -0800, Jinshan Xiong wrote:
>
>
> > On Nov 3, 2025, at 15:14, Oleg Drokin <green at whamcloud.com> wrote:
> >
> > On Mon, 2025-11-03 at 21:59 +0000, Day, Timothy via lustre-devel
> > wrote:
> > >
> > >
> > > This begs the question: if we're already doing this work to
> > > support
> > > writing Lustre objects to any arbitrary filesystem via VFS and
> > > we're
> > > only
> > > intending to support OSTs with this proposal, why not implement
> > > an OST-only VFS OSD and handle tiering in the filesystem layer?
> >
> > The problem with pure VFS is it does not actually provide us what
> > we
> > want.
> > So OSD talks to the underlying FS via VFS + some more stuff (we do
> > have
> > the hidden mount for ldiskfs after all).
> > The "more stuff" is things like expanded transaction boundaries
> > beyond
> > what posix requires so we can update more than one thing.
> > If Linux VFS provided all these abilities we would not need to
> > really
> > know much about the underlying disk fs I suspect.
> >
> > But currently it's just a way to add OSTs, not move objects
> > laterally
> > from one OST to another and hence this proposal I imagine - where
> > OSTs
> > would grow "warts" for less wanted data.
> >
> > I am not sure it's a much better idea than the already existing HSM
> > capabilities we have that would allow you to have "offline" objects
> > that would be pulled back in when used, but are otherwise just
> > visible
> > in the metadata only.
> > The underlying capabilities are pretty rich esp. if we also take
> > into
> > account the eventual WBC stuff.
>
> The major problem of current HSM is that it has to have dedicated
> clients to move data. Also, scanning the entire Lustre file system
This (dedicated client) is an implementation detail. It could be
improved in many ways and the effort spent on this would bring great
benefit to everyone?
> takes very long time so it resorts to databases in order to make
> correct decisions about which file should be released. By the time,
> the two system will be out of sync. That makes it practically
> unusable.
This again is an implementation detail, not even hardcoded anywhere.
How do you plan for the OST to to know what stuff is not used without
resorting to some database or scan? Now take this method and make it
report "upstream" where currently HSM implementations resort to
databases or scans.
Rereading your proposal, I see that this particular detail is not
covered and it's just assumed that "infrequently accessed data" would
be somehow known.
>
> >
> > If the argument is "but OSTs know best what stuff is used" (which I
> > am
> > not sure I buy, after all before you could use something off OSTs
> > you
> > need to open a file I would hope) even then OSTs could just signal
> > a
> > list of "inactive objects" that then a higher level system would
> > take
> > care of by relocatiing somewhere more sensical and changing the
> > layout
> > to indicate those objects now live elsewhere.
> >
> > The plus here is you don't need to attach this "Wart" to every OST
> > and
> > configure it everywhere and such, but rather have a central
> > location
> > that is centrally managed.
> >
More information about the lustre-devel
mailing list