[Lustre-discuss] how the lustre distribute data among disks within one OST

Jaln valiantljk at gmail.com
Sat Jun 15 23:02:39 PDT 2013


Hi Andreas,

Thanks a lot,

>this can all be pipelined by the client, which sends up to 8 RPCs
>concurrently
>for each OST.

Can you plz explain a little bit about why "this can all be pipelined by
the client"
how does the client pipeline it?
do you mean pipeline the multiple processes?

Thanks,

Jaln
________________
Jialin Liu
Ph.D. student
TTU&LBNL
http://www.myweb.ttu.edu/jialliu/

On Fri, Jun 14, 2013 at 2:47 PM, Dilger, Andreas
<andreas.dilger at intel.com>wrote:

> On 2013/13/06 6:36 PM, "Jaln" <valiantljk at gmail.com> wrote:
>
> >Thank you Chris, I'm sort of clear now.
> >In my question, stripe 0,4 means one process wants to access stripe 0 and
> >4 at the same time.
> >there is another process wants to access both stripe 0 and 2,
>
> Just to clarify the Lustre terminology here, if there are only 2 OSTs
> involved,
> there will only be two stripes, with index "0" and "1" (each with an
> arbitrary
> object ID), one on each OST.  In your case, each one will be an object of
> 3MB
> in size.
>
> >even though stripe 0, 2, 4 are in the same place (one file),
> >but their offsets are different, i.e., 0 and 2 are contiguous,
> >while from 0 to 4 there is a gap.
>
> Right, this is no different than an application reading from megabytes 0,1
> or 0,2
> from a local disk filesystem.  There will be a seek in the middle, unless
> the
> client, OSS, or RAID/disk decide to do readahead on the file or object.
> If the
> file is <= 2MB in size (llite.*.max_read_ahead_whole_mb tunable), Lustre
> will just prefetch the whole file on first access.
>
> >So my concern is, will the two processes have different I/O cost?
> >In other words, accessing 0 and 4 would take longer time than accessing 0
> >and 2.
>
> Sure, one seek per MB accessed (<= 10ms), but this is relatively close
> compared
> to the network transfer time (10ms per MB for 1GigE, 1ms per MB for
> 10GigE), and
> this can all be pipelined by the client, which sends up to 8 RPCs
> concurrently
> for each OST.
>
> Cheers, Andreas
>
> >On Thu, Jun 13, 2013 at 5:23 PM, Christopher J. Morrone
> ><morrone2 at llnl.gov> wrote:
> >
> >In that case, it is the question part that I do not understand. :)  What
> >is "stripe 0,4", why could it be "closer" then "stripe 0,2"?  In your
> >example, 0, 2, and 4 are all in the same place.
> >
> >If you file is striped over 2 OSTs, then essentially what happens behind
> >the scenes is that there are two files, one on each OST.  But Lustre
> >hides that from you, as a user.  Lustre basically does modulo operations
> >to translate a file offset from the file that
> > it presents to the user, into which ost and offset into said ost's file
> >to use.
> >
> >Does that help at all?
> >
> >Chris
> >
> >
> >On 06/13/2013 02:58 PM, Jaln wrote:
> >
> >Oh, I mean there is one file, for example 6 MB, the stripe size is 1MB,
> >and only 2 OST,
> >then the file will be divided into 6 stripes, denoted as stripe
> >0,1,2,3,4,5.
> >the distribution on the 2 OST  would be stripe 0,2,4 on OST0, stripe
> >1,3,5 on OST1.
> >
> >Jaln
> >
> >
> >On Thu, Jun 13, 2013 at 2:54 PM, Christopher J. Morrone
> >
> ><morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>> wrote:
> >
> >    I think you may be confused about what a stripe is in Lustre.  If
> >    there are only 2 OST, then you can only stripe a file across 2.
> >
> >    Or maybe I don't understand your terminology.  I don't know what you
> >    mean by "0,4" and "0,2".
> >
> >
> >    On 06/13/2013 02:38 PM, Jaln wrote:
> >
> >        if I have 6 stripes, 2 OST, using round-robin striping,
> >        stripe 0,2,4 will be on OST0,
> >        stripe 1,3,5 will be on OST1,
> >        Do you guys have any idea about what will be the difference of
> >        accessing
> >        stripe 0,4 vs stripe 0,2?
> >        stripe 0, 2 seems to be closer than 0,4, or the lustre will do
> >        some intelligent work?
> >
> >        Jaln
> >
> >
> >        On Thu, Jun 13, 2013 at 10:22 AM, Christopher J. Morrone
> >        <morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>
> >
> >        <mailto:morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>>> wrote:
> >
> >             On 06/13/2013 05:19 AM, E.S. Rosenberg wrote:
> >              > On Thu, Jun 13, 2013 at 3:09 AM, Christopher J. Morrone
> >              > <morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>
> >
> >        <mailto:morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>>> wrote:
> >              >> Lustre does not  manage the individual disks.  I sits
> >        on top of a
> >              >> filesystem, either ldiskfs(basically ext4) or zfs (as
> >        of Lustre
> >             2.4).
> >              > Is ZFS the recommended fs, or just an option?
> >              > Doesn't ZFS suffer major performance drawbacks on linux
> >        due to it
> >              > living in userspace?
> >              > Thanks,
> >              > Eli
> >
> >             LLNL (Brian Behlendorf) ported ZFS natively to Linux.  We
> >        are not using
> >             the FUSE (userspace) version.  You can find it at:
> >
> >        http://zfsonlinux.org
> >
> >             ZFS is one of the two backend filesystem options for
> >        Lustre, as of
> >             Lustre 2.4.  2.4 is the first Lustre release that fully
> >        supports using
> >             ZFS.  Here at LLNL we are using it on our newest, and
> >        largest at 55PB,
> >             filesystem.
> >
> >             Chris
> >
>
> Cheers, Andreas
> --
> Andreas Dilger
>
> Lustre Software Architect
> Intel High Performance Data Division
>
>
>


-- 

Genius only means hard-working all one's life
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130615/fdfe9ae6/attachment.htm>


More information about the lustre-discuss mailing list