[lustre-devel] RFC: Spill device for Lustre OSD
Oleg Drokin
green at whamcloud.com
Mon Nov 3 15:14:48 PST 2025
On Mon, 2025-11-03 at 21:59 +0000, Day, Timothy via lustre-devel wrote:
>
>
> This begs the question: if we're already doing this work to support
> writing Lustre objects to any arbitrary filesystem via VFS and we're
> only
> intending to support OSTs with this proposal, why not implement
> an OST-only VFS OSD and handle tiering in the filesystem layer?
The problem with pure VFS is it does not actually provide us what we
want.
So OSD talks to the underlying FS via VFS + some more stuff (we do have
the hidden mount for ldiskfs after all).
The "more stuff" is things like expanded transaction boundaries beyond
what posix requires so we can update more than one thing.
If Linux VFS provided all these abilities we would not need to really
know much about the underlying disk fs I suspect.
But currently it's just a way to add OSTs, not move objects laterally
from one OST to another and hence this proposal I imagine - where OSTs
would grow "warts" for less wanted data.
I am not sure it's a much better idea than the already existing HSM
capabilities we have that would allow you to have "offline" objects
that would be pulled back in when used, but are otherwise just visible
in the metadata only.
The underlying capabilities are pretty rich esp. if we also take into
account the eventual WBC stuff.
If the argument is "but OSTs know best what stuff is used" (which I am
not sure I buy, after all before you could use something off OSTs you
need to open a file I would hope) even then OSTs could just signal a
list of "inactive objects" that then a higher level system would take
care of by relocatiing somewhere more sensical and changing the layout
to indicate those objects now live elsewhere.
The plus here is you don't need to attach this "Wart" to every OST and
configure it everywhere and such, but rather have a central location
that is centrally managed.
More information about the lustre-devel
mailing list