<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
<div>Hi, Andreas</div>
<div><br>
</div>
<div>Sorry, if I did not make my question clear at the first place.</div>
<div><br>
</div>
<div>I am testing overstriping feature and observed a decent performance</div>
<div>improvement. Enabling overstriping using only a subset of OSTs is</div>
<div>just my experiments. I am thinking that for median-size applications</div>
<div>it may be better to use only a subset of OSTs than all of them. This</div>
<div>is based on from the perspective of complexity of network communication</div>
<div>between the computer nodes and OSS nodes.</div>
<div><br>
</div>
<div>For example, on Perlmutter at NERSC, there are a total of 370 OSTs.</div>
<div>If an applications runs on, say 100 compute nodes and 128 MPI processes</div>
<div>per node, I guess using 100 OSTs is a good number and overstriping them</div>
<div>with 3 striping count per OST performs better than 300 OSTs with no</div>
<div>overstriping. Will this be the case?</div>
<div><br>
</div>
<div>I will also run some experiments there to see.</div>
<div><br>
</div>
<div>
<div>Wei-keng</div>
</div>
<div><br>
<blockquote type="cite">
<div>On Dec 22, 2025, at 1:29 PM, Andreas Dilger <adilger@thelustrecollective.com> wrote:</div>
<br class="Apple-interchange-newline">
<div>
<div>Your first email was not clear that you are trying to overstripe<br>
the file on a subset of OSTs. When the MDS is selecting the OSTs<br>
for a file, it will always try to put each stripe on a different<br>
OST if possible (subject to limitations of the OST pool and free<br>
space on OSTs), before overstriping. There isn't any benefit to<br>
overstriping a file when there are unused OSTs available, except<br>
for synthetic test workloads. In your previous email thread you<br>
mentioned the filesystem has 160 OSTs, so an 8-stripe file will<br>
always prefer to use 8 different OSTs.<br>
<br>
Overstriping is not different than regular striping, in that you<br>
either need to use an OST pool, or specify the OST indexes to<br>
limit the allocation to a subset of OSTs.<br>
<br>
In your example, the "-C 8" is not more than the number of OSTs,<br>
so the overstriping flag is cleared from the layout, since each<br>
of the 8 stripes is on a different OST. This is true whether<br>
you use "lfs setstripe" or "llapi_layout_*()" calls.<br>
<br>
Using "-c 4 -C 8" is not different than just "-C 8", since the<br>
first stripe count is overwritten by the second stripe count.<br>
<br>
If this is just for testing bandwidth or similar, then it should<br>
be enough to specify "-o M-N,M-N[,...]" for your tests. If there<br>
is a good *production* reason to overstripe when there are more<br>
OSTs available, then I would be interested to hear what that is.<br>
<br>
Cheers, Andreas<br>
<br>
<blockquote type="cite">On Dec 22, 2025, at 10:59, Wei-Keng Liao <wkliao@northwestern.edu> wrote:<br>
<br>
Hi, Andreas<br>
<br>
The lfs-setstripe man page for option '-C' indicates only negative values<br>
can be used, and the file will be striped over all available OSTs. However,<br>
my wish is to stripe a file over only a subset set of available OSTs.<br>
Is it possible to achieve that?<br>
<br>
I just now tried the two commands below without '-o' option. My intent<br>
is to create a file with stripe count of 8 over 4 OSTs. But they both<br>
ended up with the same result of no overstriping.<br>
<br>
% lfs setstripe -c 4 -C 8 $SCRATCH/dummy<br>
% lfs setstripe -C 8 $SCRATCH/dummy<br>
<br>
% lfs getstripe $SCRATCH/dummy<br>
/pscratch/sd/w/wkliao/dummy<br>
lmm_stripe_count: 8<br>
lmm_stripe_size: 1048576<br>
lmm_pattern: raid0<br>
lmm_layout_gen: 0<br>
lmm_stripe_offset: 168<br>
lmm_pool: original<br>
obdidx objid objid group<br>
168 19587711 0x12ae27f 0x368000041f<br>
169 19224808 0x12558e8 0x36c0000428<br>
170 19783691 0x12de00b 0x3700000413<br>
171 20429006 0x137b8ce 0x3740000419<br>
172 19633677 0x12b960d 0x3780000421<br>
173 20027491 0x1319863 0x37c0000402<br>
174 19912786 0x12fd852 0x3800000401<br>
175 20862151 0x13e54c7 0x3840000418<br>
<br>
<br>
As for using llapi_layout APIs, I am doing the followings. It seems like<br>
I miss some API call to set the number of overstipes or number of stripes<br>
per OST, as they would not achieve an overstriping setting.<br>
<br>
struct llapi_layout *layout = llapi_layout_alloc();<br>
err = llapi_layout_pattern_set(layout, LLAPI_LAYOUT_OVERSTRIPING);<br>
err = llapi_layout_stripe_count_set(layout, 8);<br>
fd = llapi_layout_file_create(path, O_CREAT|O_RDWR, 0660, layout);<br>
<br>
I found the only way to achieve overstriping is to call<br>
err = llapi_layout_ost_index_set(layout, stripe_number, ost_index);<br>
However, I must pick the values for argument 'ost_index'.<br>
<br>
<br>
Wei-keng<br>
<br>
<blockquote type="cite">On Dec 22, 2025, at 4:32 AM, Andreas Dilger <adilger@thelustrecollective.com> wrote:<br>
<br>
You should be able to use "-C N" to overstripe a file without specifying the OST indexes with "-o ...".
<br>
<br>
For handling this via llapi_layout commands, I believe it is necessary to set llapi_layout_pattern_set(LLAPI_LAYOUT_OVERSTRIPING) flag on the component, and then specify a stripe count > OSTCOUNT. I see this isn't documented in the llapi_layout_pattern_set(3)
man page (along with LLAPI_LAYOUT_FOREIGN), so please file a Jira ticket for this (and ideally also submit a patch to the man page).<br>
<br>
The flag will be cleared if the stripe count <= OSTCOUNT, for improved compatibility with older clients that do not understand overstriping (though that is unlikely these days).<br>
<br>
The patch https://urldefense.com/v3/__https://review.whamcloud.com/54192__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5g5Wpb5IY$ ("LU-16938 utils: setstripe overstripe multiple OST count")
along with a few follow-on fixes in Lustre 2.16+ also allows specifying:<br>
<br>
lfs setstripe -C -N ... FILE|DIR<br>
<br>
(or llapi equivalent) to create 'N' stripes per OST for the file, instead of having to know the exact OST count, if that is more convenient.<br>
<br>
Cheers, Andreas<br>
<br>
<blockquote type="cite">On Dec 20, 2025, at 18:52, Wei-Keng Liao via lustre-discuss <lustre-discuss@lists.lustre.org> wrote:<br>
<br>
When setting the overstriping for a new file, is it possible to let<br>
the MDS to choose the OST indices?<br>
<br>
I was able to use lfs command to set an overstiping for a new file.<br>
For example, to overstripe a file over 4 OSTs with 2 stripe per OST,<br>
I am using this command:<br>
<br>
% lfs setstripe -c 4 -C 8 -o 10-13,10-13 $SCRATCH/dummy<br>
<br>
% lfs getstripe $SCRATCH/dummy | grep lmm<br>
lmm_stripe_count: 8<br>
lmm_stripe_size: 1048576<br>
lmm_pattern: raid0,overstriped<br>
lmm_layout_gen: 0<br>
lmm_stripe_offset: 10<br>
lmm_pool: original<br>
<br>
My understanding is when without overstriping, the default is that<br>
the OSTs are selected by Lustre MDS based on some policy (maybe OST<br>
usage). I wonder if this can also apply to overstriping, i.e. using<br>
lfs command options '-c' and '-C' without option '-o'.<br>
<br>
I am also wondering how this can be achieved using the Lustre user<br>
C APIs, when making calls to llapi_layout_ost_index_set().<br>
<br>
<br>
Wei-keng<br>
<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
lustre-discuss@lists.lustre.org<br>
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5gQzWGvgU$<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<br>
Andreas Dilger<br>
Principal Lustre Architect<br>
adilger@thelustrecollective.com<br>
<br>
<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</body>
</html>