[Lustre-discuss] how the lustre distribute data among disks within one OST
Jaln
valiantljk at gmail.com
Sat Jun 15 23:02:39 PDT 2013
Hi Andreas,
Thanks a lot,
>this can all be pipelined by the client, which sends up to 8 RPCs
>concurrently
>for each OST.
Can you plz explain a little bit about why "this can all be pipelined by
the client"
how does the client pipeline it?
do you mean pipeline the multiple processes?
Thanks,
Jaln
________________
Jialin Liu
Ph.D. student
TTU&LBNL
http://www.myweb.ttu.edu/jialliu/
On Fri, Jun 14, 2013 at 2:47 PM, Dilger, Andreas
<andreas.dilger at intel.com>wrote:
> On 2013/13/06 6:36 PM, "Jaln" <valiantljk at gmail.com> wrote:
>
> >Thank you Chris, I'm sort of clear now.
> >In my question, stripe 0,4 means one process wants to access stripe 0 and
> >4 at the same time.
> >there is another process wants to access both stripe 0 and 2,
>
> Just to clarify the Lustre terminology here, if there are only 2 OSTs
> involved,
> there will only be two stripes, with index "0" and "1" (each with an
> arbitrary
> object ID), one on each OST. In your case, each one will be an object of
> 3MB
> in size.
>
> >even though stripe 0, 2, 4 are in the same place (one file),
> >but their offsets are different, i.e., 0 and 2 are contiguous,
> >while from 0 to 4 there is a gap.
>
> Right, this is no different than an application reading from megabytes 0,1
> or 0,2
> from a local disk filesystem. There will be a seek in the middle, unless
> the
> client, OSS, or RAID/disk decide to do readahead on the file or object.
> If the
> file is <= 2MB in size (llite.*.max_read_ahead_whole_mb tunable), Lustre
> will just prefetch the whole file on first access.
>
> >So my concern is, will the two processes have different I/O cost?
> >In other words, accessing 0 and 4 would take longer time than accessing 0
> >and 2.
>
> Sure, one seek per MB accessed (<= 10ms), but this is relatively close
> compared
> to the network transfer time (10ms per MB for 1GigE, 1ms per MB for
> 10GigE), and
> this can all be pipelined by the client, which sends up to 8 RPCs
> concurrently
> for each OST.
>
> Cheers, Andreas
>
> >On Thu, Jun 13, 2013 at 5:23 PM, Christopher J. Morrone
> ><morrone2 at llnl.gov> wrote:
> >
> >In that case, it is the question part that I do not understand. :) What
> >is "stripe 0,4", why could it be "closer" then "stripe 0,2"? In your
> >example, 0, 2, and 4 are all in the same place.
> >
> >If you file is striped over 2 OSTs, then essentially what happens behind
> >the scenes is that there are two files, one on each OST. But Lustre
> >hides that from you, as a user. Lustre basically does modulo operations
> >to translate a file offset from the file that
> > it presents to the user, into which ost and offset into said ost's file
> >to use.
> >
> >Does that help at all?
> >
> >Chris
> >
> >
> >On 06/13/2013 02:58 PM, Jaln wrote:
> >
> >Oh, I mean there is one file, for example 6 MB, the stripe size is 1MB,
> >and only 2 OST,
> >then the file will be divided into 6 stripes, denoted as stripe
> >0,1,2,3,4,5.
> >the distribution on the 2 OST would be stripe 0,2,4 on OST0, stripe
> >1,3,5 on OST1.
> >
> >Jaln
> >
> >
> >On Thu, Jun 13, 2013 at 2:54 PM, Christopher J. Morrone
> >
> ><morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>> wrote:
> >
> > I think you may be confused about what a stripe is in Lustre. If
> > there are only 2 OST, then you can only stripe a file across 2.
> >
> > Or maybe I don't understand your terminology. I don't know what you
> > mean by "0,4" and "0,2".
> >
> >
> > On 06/13/2013 02:38 PM, Jaln wrote:
> >
> > if I have 6 stripes, 2 OST, using round-robin striping,
> > stripe 0,2,4 will be on OST0,
> > stripe 1,3,5 will be on OST1,
> > Do you guys have any idea about what will be the difference of
> > accessing
> > stripe 0,4 vs stripe 0,2?
> > stripe 0, 2 seems to be closer than 0,4, or the lustre will do
> > some intelligent work?
> >
> > Jaln
> >
> >
> > On Thu, Jun 13, 2013 at 10:22 AM, Christopher J. Morrone
> > <morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>
> >
> > <mailto:morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>>> wrote:
> >
> > On 06/13/2013 05:19 AM, E.S. Rosenberg wrote:
> > > On Thu, Jun 13, 2013 at 3:09 AM, Christopher J. Morrone
> > > <morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>
> >
> > <mailto:morrone2 at llnl.gov <mailto:morrone2 at llnl.gov>>> wrote:
> > >> Lustre does not manage the individual disks. I sits
> > on top of a
> > >> filesystem, either ldiskfs(basically ext4) or zfs (as
> > of Lustre
> > 2.4).
> > > Is ZFS the recommended fs, or just an option?
> > > Doesn't ZFS suffer major performance drawbacks on linux
> > due to it
> > > living in userspace?
> > > Thanks,
> > > Eli
> >
> > LLNL (Brian Behlendorf) ported ZFS natively to Linux. We
> > are not using
> > the FUSE (userspace) version. You can find it at:
> >
> > http://zfsonlinux.org
> >
> > ZFS is one of the two backend filesystem options for
> > Lustre, as of
> > Lustre 2.4. 2.4 is the first Lustre release that fully
> > supports using
> > ZFS. Here at LLNL we are using it on our newest, and
> > largest at 55PB,
> > filesystem.
> >
> > Chris
> >
>
> Cheers, Andreas
> --
> Andreas Dilger
>
> Lustre Software Architect
> Intel High Performance Data Division
>
>
>
--
Genius only means hard-working all one's life
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130615/fdfe9ae6/attachment.htm>
More information about the lustre-discuss
mailing list