[Lustre-discuss] "obdidx" ordering in "lfs getstripe"

Kevin Van Maren KVanMaren at fusionio.com
Tue Feb 14 05:27:27 PST 2012

On Feb 14, 2012, at 12:13 AM, Jack David wrote:

> On Thu, Feb 9, 2012 at 8:18 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
>> On 2012-02-09, at 6:20 AM, Jack David wrote:
>>> In the output of "lsf getstripe <filename> | <dirname>", the obdidx
>>> denotes the OST index (I assume).
>>> Consider the following output:
>>> lmm_stripe_count:   2
>>> lmm_stripe_size:    1048576
>>> lmm_stripe_offset:  1
>>>       obdidx           objid          objid            group
>>>            1               2            0x2                0
>>>            0               3            0x3                0
>>> where I have a setup consisting of two OSTs. If I have more than two
>>> OSTs, is it possible that I get the obdidx values out of order? Or the
>>> obdidx values will always be linear?
>>> For example, in above output, the values are linear (like 1, 0 - and
>>> this pattern will be repeated while storing the data I assume). If I
>>> have 4 OSTs, can the values be non-linear? Something like 2,0,1,3 or
>>> 2,1,3,0 (or any pattern for that matter)??
>> Typically the ordering will be linear, but this depends on a number of
>> different factors:
>> - what order the OSTs were created in:  without --index=N the OST order
>>  depends on the order in which they were first mounted, so using --index
>>  is always recommended, and will be mandatory in the future
>> - the distribution of OSTs among OSS nodes:  the MDS object allocator
>>  will normally select one OST from each OSS before allocating another
>>  object from a different OST on the same OSS
> Thanks for this information.
>> - the space available on each OST:  when OST free space is imbalanced
>>  the OSTs will be selected in part based on how full they are
> I have a doubt here. Lets say I have 4 OSTs, but the lustre client is
> issuing the write request having which can be accommodated by any
> single OST (e.g. write request is of size 512bytes and stripe_size is
> 1MB). In this case, how will the data be stored? Will the MDS maintain
> the index of next OST which should serve the request?

I think you are still confused about how it works.  The OSTs are selected
_when the file is created_.  The striping is a static map of offset to OST.
For example, if the stripe count = 2, and the stripe size = 1MB, then
0-1MB goes to the first OST, 1-2MB goes to the second, 2-3 goes to the first, etc.

The free space impacts _which_ OSTs are selected when a file is created,
it does NOT impact where data is written once a file a created.  So if an OST
fills up, every file that resides on that OST will be unable to grow if the growth is
to an offset that maps to that OST.


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.  Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor.

More information about the lustre-discuss mailing list