[lustre-discuss] Overstriping setting
Andreas Dilger
adilger at thelustrecollective.com
Mon Dec 22 11:29:42 PST 2025
Your first email was not clear that you are trying to overstripe
the file on a subset of OSTs. When the MDS is selecting the OSTs
for a file, it will always try to put each stripe on a different
OST if possible (subject to limitations of the OST pool and free
space on OSTs), before overstriping. There isn't any benefit to
overstriping a file when there are unused OSTs available, except
for synthetic test workloads. In your previous email thread you
mentioned the filesystem has 160 OSTs, so an 8-stripe file will
always prefer to use 8 different OSTs.
Overstriping is not different than regular striping, in that you
either need to use an OST pool, or specify the OST indexes to
limit the allocation to a subset of OSTs.
In your example, the "-C 8" is not more than the number of OSTs,
so the overstriping flag is cleared from the layout, since each
of the 8 stripes is on a different OST. This is true whether
you use "lfs setstripe" or "llapi_layout_*()" calls.
Using "-c 4 -C 8" is not different than just "-C 8", since the
first stripe count is overwritten by the second stripe count.
If this is just for testing bandwidth or similar, then it should
be enough to specify "-o M-N,M-N[,...]" for your tests. If there
is a good *production* reason to overstripe when there are more
OSTs available, then I would be interested to hear what that is.
Cheers, Andreas
> On Dec 22, 2025, at 10:59, Wei-Keng Liao <wkliao at northwestern.edu> wrote:
>
> Hi, Andreas
>
> The lfs-setstripe man page for option '-C' indicates only negative values
> can be used, and the file will be striped over all available OSTs. However,
> my wish is to stripe a file over only a subset set of available OSTs.
> Is it possible to achieve that?
>
> I just now tried the two commands below without '-o' option. My intent
> is to create a file with stripe count of 8 over 4 OSTs. But they both
> ended up with the same result of no overstriping.
>
> % lfs setstripe -c 4 -C 8 $SCRATCH/dummy
> % lfs setstripe -C 8 $SCRATCH/dummy
>
> % lfs getstripe $SCRATCH/dummy
> /pscratch/sd/w/wkliao/dummy
> lmm_stripe_count: 8
> lmm_stripe_size: 1048576
> lmm_pattern: raid0
> lmm_layout_gen: 0
> lmm_stripe_offset: 168
> lmm_pool: original
> obdidx objid objid group
> 168 19587711 0x12ae27f 0x368000041f
> 169 19224808 0x12558e8 0x36c0000428
> 170 19783691 0x12de00b 0x3700000413
> 171 20429006 0x137b8ce 0x3740000419
> 172 19633677 0x12b960d 0x3780000421
> 173 20027491 0x1319863 0x37c0000402
> 174 19912786 0x12fd852 0x3800000401
> 175 20862151 0x13e54c7 0x3840000418
>
>
> As for using llapi_layout APIs, I am doing the followings. It seems like
> I miss some API call to set the number of overstipes or number of stripes
> per OST, as they would not achieve an overstriping setting.
>
> struct llapi_layout *layout = llapi_layout_alloc();
> err = llapi_layout_pattern_set(layout, LLAPI_LAYOUT_OVERSTRIPING);
> err = llapi_layout_stripe_count_set(layout, 8);
> fd = llapi_layout_file_create(path, O_CREAT|O_RDWR, 0660, layout);
>
> I found the only way to achieve overstriping is to call
> err = llapi_layout_ost_index_set(layout, stripe_number, ost_index);
> However, I must pick the values for argument 'ost_index'.
>
>
> Wei-keng
>
>> On Dec 22, 2025, at 4:32 AM, Andreas Dilger <adilger at thelustrecollective.com> wrote:
>>
>> You should be able to use "-C N" to overstripe a file without specifying the OST indexes with "-o ...".
>>
>> For handling this via llapi_layout commands, I believe it is necessary to set llapi_layout_pattern_set(LLAPI_LAYOUT_OVERSTRIPING) flag on the component, and then specify a stripe count > OSTCOUNT. I see this isn't documented in the llapi_layout_pattern_set(3) man page (along with LLAPI_LAYOUT_FOREIGN), so please file a Jira ticket for this (and ideally also submit a patch to the man page).
>>
>> The flag will be cleared if the stripe count <= OSTCOUNT, for improved compatibility with older clients that do not understand overstriping (though that is unlikely these days).
>>
>> The patch https://urldefense.com/v3/__https://review.whamcloud.com/54192__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5g5Wpb5IY$ ("LU-16938 utils: setstripe overstripe multiple OST count") along with a few follow-on fixes in Lustre 2.16+ also allows specifying:
>>
>> lfs setstripe -C -N ... FILE|DIR
>>
>> (or llapi equivalent) to create 'N' stripes per OST for the file, instead of having to know the exact OST count, if that is more convenient.
>>
>> Cheers, Andreas
>>
>>> On Dec 20, 2025, at 18:52, Wei-Keng Liao via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
>>>
>>> When setting the overstriping for a new file, is it possible to let
>>> the MDS to choose the OST indices?
>>>
>>> I was able to use lfs command to set an overstiping for a new file.
>>> For example, to overstripe a file over 4 OSTs with 2 stripe per OST,
>>> I am using this command:
>>>
>>> % lfs setstripe -c 4 -C 8 -o 10-13,10-13 $SCRATCH/dummy
>>>
>>> % lfs getstripe $SCRATCH/dummy | grep lmm
>>> lmm_stripe_count: 8
>>> lmm_stripe_size: 1048576
>>> lmm_pattern: raid0,overstriped
>>> lmm_layout_gen: 0
>>> lmm_stripe_offset: 10
>>> lmm_pool: original
>>>
>>> My understanding is when without overstriping, the default is that
>>> the OSTs are selected by Lustre MDS based on some policy (maybe OST
>>> usage). I wonder if this can also apply to overstriping, i.e. using
>>> lfs command options '-c' and '-C' without option '-o'.
>>>
>>> I am also wondering how this can be achieved using the Lustre user
>>> C APIs, when making calls to llapi_layout_ost_index_set().
>>>
>>>
>>> Wei-keng
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5gQzWGvgU$
>
Andreas Dilger
Principal Lustre Architect
adilger at thelustrecollective.com
More information about the lustre-discuss
mailing list