[lustre-discuss] Overstriping setting
Åke Sandgren
ake.sandgren at umu.se
Mon Dec 22 23:12:45 PST 2025
Ah ok, didn't read through the discussion well enough it appears...
________________________________________
From: Andreas Dilger <adilger at thelustrecollective.com>
Sent: Tuesday, December 23, 2025 8:09
To: Åke Sandgren
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] Overstriping setting
Hi Åke,
I'm not arguing against overstriping itself. Definitely for shared file workloads, having more objects/locks can improve performance.
The question is whether eg. 2 stripes on each of 100 OSTs is faster than 1 stripe on each of 200 OSTs, not whether it is faster than 1 stripe on each of 100 OSTs...
Cheers, Andreas
> On Dec 22, 2025, at 23:37, Åke Sandgren via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
>
> Hi!
>
> That logic only applies when the OST's are made up of single disks. If they are LUN's behind a raid controller or otherwise consists of multiple physical disks then overstriping can indeed result in higher performance. We've seen this when overstriping on our DDN based lustre, up to 4x overstriping was giving a more or less linear increase. Those OSTs are 8+2 raid6-ish. I never tried with 8x overstriping because 4x was enough for our purpose.
> Also we did 4x/OST over all 8 OSTs so 32 stripes on 8 OSTs when testing.
>
> ________________________________________
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Andreas Dilger via lustre-discuss <lustre-discuss at lists.lustre.org>
> Sent: Tuesday, December 23, 2025 1:47
> To: Wei-Keng Liao
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] Overstriping setting
>
> I don't think that using 3 stripes per OST is ever going to be
> faster than using 3 separate OSTs, especially if the OSTs are
> HDD based instead of flash. Even with NVMe OSTs, there is still
> contention on the block device queue (elevator, queue depth, etc.)
>
> With separate OSTs, then there are more resources available that
> can be leveraged with less contention. Consider DLM lock server
> resources such as the DLM lock hash, or OST filesystem resources
> like block allocators. With separate OSTs, those can be used
> with less contention compared to having 3 objects sharing the
> same resources.
>
> Also, using more OSTs (when warranted) will distribute space
> usage more evenly across devices.
>
> That said, there is some benefit to potentially leaving a few
> OSTs out of the allocation, if that aligns with the application.
> That allows the MDS to skip OSTs that are full or busy, instead
> of trying to always allocate objects from all of the OSTs.
>
> That said, there isn't an easy way to overstripe, say, 900 stripes
> evenly across 300 of the 370 OSTs, instead of 3 stripes on 160 of
> the 370 OSTs and 2 stripes on 210 of the OSTs. It _might_ be good
> to do this if it shows better performance, but I think even then
> the uneven loading would still be better than only using 300 OSTs.
>
> Cheers, Andreas
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list