[Lustre-discuss] "obdidx" ordering in "lfs getstripe"

Jack David jd6589 at gmail.com
Mon Feb 13 23:13:37 PST 2012


On Thu, Feb 9, 2012 at 8:18 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
> On 2012-02-09, at 6:20 AM, Jack David wrote:
>> In the output of "lsf getstripe <filename> | <dirname>", the obdidx
>> denotes the OST index (I assume).
>>
>> Consider the following output:
>>
>> lmm_stripe_count:   2
>> lmm_stripe_size:    1048576
>> lmm_stripe_offset:  1
>>       obdidx           objid          objid            group
>>            1               2            0x2                0
>>            0               3            0x3                0
>>
>> where I have a setup consisting of two OSTs. If I have more than two
>> OSTs, is it possible that I get the obdidx values out of order? Or the
>> obdidx values will always be linear?
>>
>> For example, in above output, the values are linear (like 1, 0 - and
>> this pattern will be repeated while storing the data I assume). If I
>> have 4 OSTs, can the values be non-linear? Something like 2,0,1,3 or
>> 2,1,3,0 (or any pattern for that matter)??
>
> Typically the ordering will be linear, but this depends on a number of
> different factors:
> - what order the OSTs were created in:  without --index=N the OST order
>  depends on the order in which they were first mounted, so using --index
>  is always recommended, and will be mandatory in the future
> - the distribution of OSTs among OSS nodes:  the MDS object allocator
>  will normally select one OST from each OSS before allocating another
>  object from a different OST on the same OSS

Thanks for this information.

> - the space available on each OST:  when OST free space is imbalanced
>  the OSTs will be selected in part based on how full they are

I have a doubt here. Lets say I have 4 OSTs, but the lustre client is
issuing the write request having which can be accommodated by any
single OST (e.g. write request is of size 512bytes and stripe_size is
1MB). In this case, how will the data be stored? Will the MDS maintain
the index of next OST which should serve the request?

>
>> My assumption on how the data is stored on OSTs:
>> Based upon the values of obdidx, each OST will store a stripe_size
>> worth data into the objid (a file under ldiskfs volume of that OST) in
>> rotation. So if I get the obdidx like 2,1,3,0 and stripe_size if 1MB,
>> then the data will be stored in following order:
>>
>> 1st MB: 2nd OST
>> 2nd MB: 1st OST
>> 3rdMB: 3rd OST
>> 4thMB: 0th OST
>> 5th MB: 2nd OST (Again - repeating the pattern)
>> 6th MB: 1st OST
>>
>> Is this understanding correct?? I hope I am clear on my question.
>
> Correct.  The data is strictly round-robin on the objects once they
> are allocated to a file.
>

Thanks again,
J

> Cheers, Andreas
> --
> Andreas Dilger                       Whamcloud, Inc.
> Principal Engineer                   http://www.whamcloud.com/
>
>
>
>



-- 
J



More information about the lustre-discuss mailing list