[lustre-discuss] Overstriping setting
Wei-Keng Liao
wkliao at northwestern.edu
Mon Dec 22 12:16:05 PST 2025
Hi, Andreas
Sorry, if I did not make my question clear at the first place.
I am testing overstriping feature and observed a decent performance
improvement. Enabling overstriping using only a subset of OSTs is
just my experiments. I am thinking that for median-size applications
it may be better to use only a subset of OSTs than all of them. This
is based on from the perspective of complexity of network communication
between the computer nodes and OSS nodes.
For example, on Perlmutter at NERSC, there are a total of 370 OSTs.
If an applications runs on, say 100 compute nodes and 128 MPI processes
per node, I guess using 100 OSTs is a good number and overstriping them
with 3 striping count per OST performs better than 300 OSTs with no
overstriping. Will this be the case?
I will also run some experiments there to see.
Wei-keng
On Dec 22, 2025, at 1:29 PM, Andreas Dilger <adilger at thelustrecollective.com> wrote:
Your first email was not clear that you are trying to overstripe
the file on a subset of OSTs. When the MDS is selecting the OSTs
for a file, it will always try to put each stripe on a different
OST if possible (subject to limitations of the OST pool and free
space on OSTs), before overstriping. There isn't any benefit to
overstriping a file when there are unused OSTs available, except
for synthetic test workloads. In your previous email thread you
mentioned the filesystem has 160 OSTs, so an 8-stripe file will
always prefer to use 8 different OSTs.
Overstriping is not different than regular striping, in that you
either need to use an OST pool, or specify the OST indexes to
limit the allocation to a subset of OSTs.
In your example, the "-C 8" is not more than the number of OSTs,
so the overstriping flag is cleared from the layout, since each
of the 8 stripes is on a different OST. This is true whether
you use "lfs setstripe" or "llapi_layout_*()" calls.
Using "-c 4 -C 8" is not different than just "-C 8", since the
first stripe count is overwritten by the second stripe count.
If this is just for testing bandwidth or similar, then it should
be enough to specify "-o M-N,M-N[,...]" for your tests. If there
is a good *production* reason to overstripe when there are more
OSTs available, then I would be interested to hear what that is.
Cheers, Andreas
On Dec 22, 2025, at 10:59, Wei-Keng Liao <wkliao at northwestern.edu> wrote:
Hi, Andreas
The lfs-setstripe man page for option '-C' indicates only negative values
can be used, and the file will be striped over all available OSTs. However,
my wish is to stripe a file over only a subset set of available OSTs.
Is it possible to achieve that?
I just now tried the two commands below without '-o' option. My intent
is to create a file with stripe count of 8 over 4 OSTs. But they both
ended up with the same result of no overstriping.
% lfs setstripe -c 4 -C 8 $SCRATCH/dummy
% lfs setstripe -C 8 $SCRATCH/dummy
% lfs getstripe $SCRATCH/dummy
/pscratch/sd/w/wkliao/dummy
lmm_stripe_count: 8
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 168
lmm_pool: original
obdidx objid objid group
168 19587711 0x12ae27f 0x368000041f
169 19224808 0x12558e8 0x36c0000428
170 19783691 0x12de00b 0x3700000413
171 20429006 0x137b8ce 0x3740000419
172 19633677 0x12b960d 0x3780000421
173 20027491 0x1319863 0x37c0000402
174 19912786 0x12fd852 0x3800000401
175 20862151 0x13e54c7 0x3840000418
As for using llapi_layout APIs, I am doing the followings. It seems like
I miss some API call to set the number of overstipes or number of stripes
per OST, as they would not achieve an overstriping setting.
struct llapi_layout *layout = llapi_layout_alloc();
err = llapi_layout_pattern_set(layout, LLAPI_LAYOUT_OVERSTRIPING);
err = llapi_layout_stripe_count_set(layout, 8);
fd = llapi_layout_file_create(path, O_CREAT|O_RDWR, 0660, layout);
I found the only way to achieve overstriping is to call
err = llapi_layout_ost_index_set(layout, stripe_number, ost_index);
However, I must pick the values for argument 'ost_index'.
Wei-keng
On Dec 22, 2025, at 4:32 AM, Andreas Dilger <adilger at thelustrecollective.com> wrote:
You should be able to use "-C N" to overstripe a file without specifying the OST indexes with "-o ...".
For handling this via llapi_layout commands, I believe it is necessary to set llapi_layout_pattern_set(LLAPI_LAYOUT_OVERSTRIPING) flag on the component, and then specify a stripe count > OSTCOUNT. I see this isn't documented in the llapi_layout_pattern_set(3) man page (along with LLAPI_LAYOUT_FOREIGN), so please file a Jira ticket for this (and ideally also submit a patch to the man page).
The flag will be cleared if the stripe count <= OSTCOUNT, for improved compatibility with older clients that do not understand overstriping (though that is unlikely these days).
The patch https://urldefense.com/v3/__https://review.whamcloud.com/54192__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5g5Wpb5IY$ ("LU-16938 utils: setstripe overstripe multiple OST count") along with a few follow-on fixes in Lustre 2.16+ also allows specifying:
lfs setstripe -C -N ... FILE|DIR
(or llapi equivalent) to create 'N' stripes per OST for the file, instead of having to know the exact OST count, if that is more convenient.
Cheers, Andreas
On Dec 20, 2025, at 18:52, Wei-Keng Liao via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
When setting the overstriping for a new file, is it possible to let
the MDS to choose the OST indices?
I was able to use lfs command to set an overstiping for a new file.
For example, to overstripe a file over 4 OSTs with 2 stripe per OST,
I am using this command:
% lfs setstripe -c 4 -C 8 -o 10-13,10-13 $SCRATCH/dummy
% lfs getstripe $SCRATCH/dummy | grep lmm
lmm_stripe_count: 8
lmm_stripe_size: 1048576
lmm_pattern: raid0,overstriped
lmm_layout_gen: 0
lmm_stripe_offset: 10
lmm_pool: original
My understanding is when without overstriping, the default is that
the OSTs are selected by Lustre MDS based on some policy (maybe OST
usage). I wonder if this can also apply to overstriping, i.e. using
lfs command options '-c' and '-C' without option '-o'.
I am also wondering how this can be achieved using the Lustre user
C APIs, when making calls to llapi_layout_ost_index_set().
Wei-keng
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5gQzWGvgU$
Andreas Dilger
Principal Lustre Architect
adilger at thelustrecollective.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251222/34552d10/attachment-0001.htm>
More information about the lustre-discuss
mailing list