[lustre-devel] RFC: Spill device for Lustre OSD

Jinshan Xiong jinshanx at google.com
Tue Nov 4 16:28:05 PST 2025


On Tue, Nov 4, 2025 at 4:20 PM Andreas Dilger via lustre-devel <
lustre-devel at lists.lustre.org> wrote:

> On Nov 4, 2025, at 4:42 PM, Jinshan Xiong wrote:
> > On Nov 4, 2025, at 15:07, Andreas Dilger <adilger at ddn.com> wrote:
> >> Timothy Day <timday at amazon.com> wrote:
> >>> Overall, I think the concept is interesting. It reminds me of how
> >>> Bcachefs handle multi-device support. Each device can be
> >>> designated as holding metadata or data replicas. And you
> >>> can control the promotion and migration between different
> >>> targets (all managed by a migration daemon). But this design is
> >>> too limited, IMHO. If we're going to accept the additional complexity
> >>> in the OSD, the solution has to be extensible. What if I want to
> >>> replicate to multiple targets? What if I want more than two tiers?
> >>> What if I want to transparently migrate data from one spill device to
> >>> another? We don't need this for the initial implementation, sure.
> >>> But these seem like natural extensions.
> >
> > It’s possible to extend the design to have multiple spill devices in the
> OSD; you could have two spill devices and mirror them, or raid0 to make a
> larger device. I don’t see the design would not allow you to do that.
> >
> >> This is essentially replicating Lustre file layouts in the end, which
> >> was my original suggestion - to use FLR and/or PCC-RO foreign
> >> mirror layouts for this, even if it is not directly accessible from
> >> clients.  That avoids reimplementing tools/formats that already
> >> exist in Lustre today for relatively little benefit.
> >
> > One of the goals is to not have a file system-level scanner, which is
> not good. Otherwise, we can just use FLR-based tiered storage.
>
> I don't see how the layout xattr format is related to filesystem-level
> scanning?  This is "just" the xattr format *STORED ON THE OST OBJECT*,
> and it could be handled by direct device-level (OST) scan tools as well.
>

I see. I initially thought you proposed the spill device would function as
a mirror and that the existing software stack would handle it.

Are we going to store the spilled object's status in the layout? It sounds
difficult because you will have to initiate a layout change from the OSTs.
I think we still need to store such an xattr locally in the OST object.


>
> This would allow mirrors (FLR), multiple spill devices (multiple FLR
> mirrors), concatenated devices (PFL), etc.  It just avoids adding new
> formats that need to be parsed, printed, etc.
>
> Cheers, Andreas
>
>
>
>
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20251104/ee016435/attachment.htm>


More information about the lustre-devel mailing list