[lustre-devel] RFC: Spill device for Lustre OSD

Tue Nov 4 09:51:44 PST 2025

On Tue, Nov 4, 2025 at 8:37 AM Andreas Dilger <adilger at ddn.com> wrote:

> On Nov 3, 2025, at 18:58, Oleg Drokin via lustre-devel <
> lustre-devel at lists.lustre.org> wrote:
>
>
> On Mon, 2025-11-03 at 16:33 -0800, Jinshan Xiong wrote:
>
> I guess users won't have 1PB OSTs, will they?
>
>
> There probably are already? NASA has a known 0.5P OST configuration:
>
> https://www.nas.nasa.gov/hecc/support/kb/lustre-progressive-file-layout-(pfl)-with-ssd-and-hdd-pools_680.html#:~:text=The%20available%20SSD%20space%20in%20each%20filesystem,decimal%20(far%20right)%20labels%20of%20each%20OST
>
>
> In order to maximize rebuild performance for declustered parity RAID,
> there are OSTs in production with 90x20TB HDDs = 1.4 PB today,
> and requests to have even larger OSTs.  We've done a bunch of work
> to improve huge ldiskfs OST performance, including the hybrid OST
> patches like https://review.whamcloud.com/51625 ("LU-16750 ldiskfs:
> optimize metadata allocation for hybrid LUNs"), but there could still
> be further improvements in supporting such large OSTs.
>

Since they are already HDDs, this feature won't apply because it makes no
sense for them.

For SSDs, cloud users prefer to spread them across multiple servers to
fully utilize disk bandwidth. Bandwidth is likely the first priority. Once
that's solved, they want larger capacity so they don't have to load new
worksets each time.

>
> Cheers, Andreas
> —
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud/DDN
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20251104/17ca67b7/attachment-0001.htm>