[Lustre-discuss] "obdidx" ordering in "lfs getstripe"

Jack David jd6589 at gmail.com
Tue Feb 14 05:51:28 PST 2012

On Tue, Feb 14, 2012 at 6:57 PM, Kevin Van Maren <KVanMaren at fusionio.com> wrote:
> On Feb 14, 2012, at 12:13 AM, Jack David wrote:
>> On Thu, Feb 9, 2012 at 8:18 PM, Andreas Dilger <adilger at whamcloud.com> wrote:
>>> On 2012-02-09, at 6:20 AM, Jack David wrote:
>>>> In the output of "lsf getstripe <filename> | <dirname>", the obdidx
>>>> denotes the OST index (I assume).
>>>> Consider the following output:
>>>> lmm_stripe_count:   2
>>>> lmm_stripe_size:    1048576
>>>> lmm_stripe_offset:  1
>>>>       obdidx           objid          objid            group
>>>>            1               2            0x2                0
>>>>            0               3            0x3                0
>>>> where I have a setup consisting of two OSTs. If I have more than two
>>>> OSTs, is it possible that I get the obdidx values out of order? Or the
>>>> obdidx values will always be linear?
>>>> For example, in above output, the values are linear (like 1, 0 - and
>>>> this pattern will be repeated while storing the data I assume). If I
>>>> have 4 OSTs, can the values be non-linear? Something like 2,0,1,3 or
>>>> 2,1,3,0 (or any pattern for that matter)??
>>> Typically the ordering will be linear, but this depends on a number of
>>> different factors:
>>> - what order the OSTs were created in:  without --index=N the OST order
>>>  depends on the order in which they were first mounted, so using --index
>>>  is always recommended, and will be mandatory in the future
>>> - the distribution of OSTs among OSS nodes:  the MDS object allocator
>>>  will normally select one OST from each OSS before allocating another
>>>  object from a different OST on the same OSS
>> Thanks for this information.
>>> - the space available on each OST:  when OST free space is imbalanced
>>>  the OSTs will be selected in part based on how full they are
>> I have a doubt here. Lets say I have 4 OSTs, but the lustre client is
>> issuing the write request having which can be accommodated by any
>> single OST (e.g. write request is of size 512bytes and stripe_size is
>> 1MB). In this case, how will the data be stored? Will the MDS maintain
>> the index of next OST which should serve the request?
> I think you are still confused about how it works.  The OSTs are selected
> _when the file is created_.  The striping is a static map of offset to OST.
> For example, if the stripe count = 2, and the stripe size = 1MB, then
> 0-1MB goes to the first OST, 1-2MB goes to the second, 2-3 goes to the first, etc.
I understand that, but just got curious that does lustre client keeps
track of which is the _next_ OST where the IO request should go to? I
am unaware that who decides the stripe_size at the time of file
creation (by default is 1MB - from lfs setstripe man page), so I
assume client is not bothered about that. But if the client is
generating the write request which is not in multiple of stripe_size,
multiple write requests can be and stored into one OST (e.g. if stripe
size is 1MB, then 20 req of 512bytes can be stored in OST1, next 20
reqs on OST2 and likewise).

Actually I am trying to understand how can I leverage the pNFS file
layout semantics (which communicates to Data Servers directly once the
layout is supplied by Meta Data Server) with Lustre Filesystem, and
that is the source of such questions.

> The free space impacts _which_ OSTs are selected when a file is created,
> it does NOT impact where data is written once a file a created.  So if an OST
> fills up, every file that resides on that OST will be unable to grow if the growth is
> to an offset that maps to that OST.

Good to know that.

> Kevin
> Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.  Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor.


More information about the lustre-discuss mailing list