[lustre-discuss] Overstriping setting

Wei-Keng Liao wkliao at northwestern.edu
Mon Dec 22 12:16:05 PST 2025


Hi, Andreas

Sorry, if I did not make my question clear at the first place.

I am testing overstriping feature and observed a decent performance
improvement. Enabling overstriping using only a subset of OSTs is
just my experiments. I am thinking that for median-size applications
it may be better to use only a subset of OSTs than all of them. This
is based on from the perspective of complexity of network communication
between the computer nodes and OSS nodes.

For example, on Perlmutter at NERSC, there are a total of 370 OSTs.
If an applications runs on, say 100 compute nodes and 128 MPI processes
per node, I guess using 100 OSTs is a good number and overstriping them
with 3 striping count per OST performs better than 300 OSTs with no
overstriping. Will this be the case?

I will also run some experiments there to see.

Wei-keng

On Dec 22, 2025, at 1:29 PM, Andreas Dilger <adilger at thelustrecollective.com> wrote:

Your first email was not clear that you are trying to overstripe
the file on a subset of OSTs.  When the MDS is selecting the OSTs
for a file, it will always try to put each stripe on a different
OST if possible (subject to limitations of the OST pool and free
space on OSTs), before overstriping.  There isn't any benefit to
overstriping a file when there are unused OSTs available, except
for synthetic test workloads.  In your previous email thread you
mentioned the filesystem has 160 OSTs, so an 8-stripe file will
always prefer to use 8 different OSTs.

Overstriping is not different than regular striping, in that you
either need to use an OST pool, or specify the OST indexes to
limit the allocation to a subset of OSTs.

In your example, the "-C 8" is not more than the number of OSTs,
so the overstriping flag is cleared from the layout, since each
of the 8 stripes is on a different OST.  This is true whether
you use "lfs setstripe" or "llapi_layout_*()" calls.

Using "-c 4 -C 8" is not different than just "-C 8", since the
first stripe count is overwritten by the second stripe count.

If this is just for testing bandwidth or similar, then it should
be enough to specify "-o M-N,M-N[,...]" for your tests.  If there
is a good *production* reason to overstripe when there are more
OSTs available, then I would be interested to hear what that is.

Cheers, Andreas

On Dec 22, 2025, at 10:59, Wei-Keng Liao <wkliao at northwestern.edu> wrote:

Hi, Andreas

The lfs-setstripe man page for option '-C' indicates only negative values
can be used, and the file will be striped over all available OSTs. However,
my wish is to stripe a file over only a subset set of available OSTs.
Is it possible to achieve that?

I just now tried the two commands below without '-o' option. My intent
is to create a file with stripe count of 8 over 4 OSTs. But they both
ended up with the same result of no overstriping.

% lfs setstripe -c 4 -C 8 $SCRATCH/dummy
% lfs setstripe -C 8 $SCRATCH/dummy

% lfs getstripe $SCRATCH/dummy
/pscratch/sd/w/wkliao/dummy
lmm_stripe_count:  8
lmm_stripe_size:   1048576
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 168
lmm_pool:          original
obdidx objid objid group
 168      19587711    0x12ae27f  0x368000041f
 169      19224808    0x12558e8  0x36c0000428
 170      19783691    0x12de00b  0x3700000413
 171      20429006    0x137b8ce  0x3740000419
 172      19633677    0x12b960d  0x3780000421
 173      20027491    0x1319863  0x37c0000402
 174      19912786    0x12fd852  0x3800000401
 175      20862151    0x13e54c7  0x3840000418


As for using llapi_layout APIs, I am doing the followings. It seems like
I miss some API call to set the number of overstipes or number of stripes
per OST, as they would not achieve an overstriping setting.

  struct llapi_layout *layout = llapi_layout_alloc();
  err = llapi_layout_pattern_set(layout, LLAPI_LAYOUT_OVERSTRIPING);
  err = llapi_layout_stripe_count_set(layout, 8);
  fd = llapi_layout_file_create(path, O_CREAT|O_RDWR, 0660, layout);

I found the only way to achieve overstriping is to call
  err = llapi_layout_ost_index_set(layout, stripe_number, ost_index);
However, I must pick the values for argument 'ost_index'.


Wei-keng

On Dec 22, 2025, at 4:32 AM, Andreas Dilger <adilger at thelustrecollective.com> wrote:

You should be able to use "-C N" to overstripe a file without specifying the OST indexes with "-o ...".

For handling this via llapi_layout commands, I believe it is necessary to set llapi_layout_pattern_set(LLAPI_LAYOUT_OVERSTRIPING) flag on the component, and then specify a stripe count > OSTCOUNT.  I see this isn't documented in the llapi_layout_pattern_set(3) man page (along with LLAPI_LAYOUT_FOREIGN), so please file a Jira ticket for this (and ideally also submit a patch to the man page).

The flag will be cleared if the stripe count <= OSTCOUNT, for improved compatibility with older clients that do not understand overstriping (though that is unlikely these days).

The patch https://urldefense.com/v3/__https://review.whamcloud.com/54192__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5g5Wpb5IY$  ("LU-16938 utils: setstripe overstripe multiple OST count") along with a few follow-on fixes in Lustre 2.16+ also allows specifying:

  lfs setstripe -C -N ... FILE|DIR

(or llapi equivalent) to create 'N' stripes per OST for the file, instead of having to know the exact OST count, if that is more convenient.

Cheers, Andreas

On Dec 20, 2025, at 18:52, Wei-Keng Liao via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:

When setting the overstriping for a new file, is it possible to let
the MDS to choose the OST indices?

I was able to use lfs command to set an overstiping for a new file.
For example, to overstripe a file over 4 OSTs with 2 stripe per OST,
I am using this command:

%  lfs setstripe -c 4 -C 8 -o 10-13,10-13 $SCRATCH/dummy

%  lfs getstripe $SCRATCH/dummy | grep lmm
lmm_stripe_count:  8
lmm_stripe_size:   1048576
lmm_pattern:       raid0,overstriped
lmm_layout_gen:    0
lmm_stripe_offset: 10
lmm_pool:          original

My understanding is when without overstriping, the default is that
the OSTs are selected by Lustre MDS based on some policy (maybe OST
usage). I wonder if this can also apply to overstriping, i.e. using
lfs command options '-c' and '-C' without option '-o'.

I am also wondering how this can be achieved using the Lustre user
C APIs, when making calls to llapi_layout_ost_index_set().


Wei-keng

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Dq0X2DkFhyF93HkjWTBQKhk!WfAgqXmWvikjup5ElLwLsZJgoZKUnWW5SoI78awomasNdwbkf6Z93WQJk7s3RlYK7WjKpirPXZDYqPxEnZKilN5gQzWGvgU$


Andreas Dilger
Principal Lustre Architect
adilger at thelustrecollective.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20251222/34552d10/attachment-0001.htm>


More information about the lustre-discuss mailing list