[Lustre-discuss] "obdidx" ordering in "lfs getstripe"
Jack David
jd6589 at gmail.com
Mon Feb 13 23:13:37 PST 2012
On Thu, Feb 9, 2012 at 8:18 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
> On 2012-02-09, at 6:20 AM, Jack David wrote:
>> In the output of "lsf getstripe <filename> | <dirname>", the obdidx
>> denotes the OST index (I assume).
>>
>> Consider the following output:
>>
>> lmm_stripe_count: 2
>> lmm_stripe_size: 1048576
>> lmm_stripe_offset: 1
>> obdidx objid objid group
>> 1 2 0x2 0
>> 0 3 0x3 0
>>
>> where I have a setup consisting of two OSTs. If I have more than two
>> OSTs, is it possible that I get the obdidx values out of order? Or the
>> obdidx values will always be linear?
>>
>> For example, in above output, the values are linear (like 1, 0 - and
>> this pattern will be repeated while storing the data I assume). If I
>> have 4 OSTs, can the values be non-linear? Something like 2,0,1,3 or
>> 2,1,3,0 (or any pattern for that matter)??
>
> Typically the ordering will be linear, but this depends on a number of
> different factors:
> - what order the OSTs were created in: without --index=N the OST order
> depends on the order in which they were first mounted, so using --index
> is always recommended, and will be mandatory in the future
> - the distribution of OSTs among OSS nodes: the MDS object allocator
> will normally select one OST from each OSS before allocating another
> object from a different OST on the same OSS
Thanks for this information.
> - the space available on each OST: when OST free space is imbalanced
> the OSTs will be selected in part based on how full they are
I have a doubt here. Lets say I have 4 OSTs, but the lustre client is
issuing the write request having which can be accommodated by any
single OST (e.g. write request is of size 512bytes and stripe_size is
1MB). In this case, how will the data be stored? Will the MDS maintain
the index of next OST which should serve the request?
>
>> My assumption on how the data is stored on OSTs:
>> Based upon the values of obdidx, each OST will store a stripe_size
>> worth data into the objid (a file under ldiskfs volume of that OST) in
>> rotation. So if I get the obdidx like 2,1,3,0 and stripe_size if 1MB,
>> then the data will be stored in following order:
>>
>> 1st MB: 2nd OST
>> 2nd MB: 1st OST
>> 3rdMB: 3rd OST
>> 4thMB: 0th OST
>> 5th MB: 2nd OST (Again - repeating the pattern)
>> 6th MB: 1st OST
>>
>> Is this understanding correct?? I hope I am clear on my question.
>
> Correct. The data is strictly round-robin on the objects once they
> are allocated to a file.
>
Thanks again,
J
> Cheers, Andreas
> --
> Andreas Dilger Whamcloud, Inc.
> Principal Engineer http://www.whamcloud.com/
>
>
>
>
--
J
More information about the lustre-discuss
mailing list