[lustre-discuss] Lustre on ZFS pooer direct I/O performance
Riccardo Veraldi
Riccardo.Veraldi at cnaf.infn.it
Sat Oct 15 22:01:47 PDT 2016
On 14/10/16 14:38, Dilger, Andreas wrote:
>
> John, with newer Lustre clients it is possible for multiple threads to
> submit non-overlapping writes concurrently (also not conflicting
> within a single page), see LU-1669 for details.
>
> Even so, O_DIRECT writes need to be synchronous to disk on the OSS, as
> Patrick reports, because if the OSS fails before the write is on disk
> there is no cached copy of the data on the client that can be used to
> resend the RPC.
>
> The problem is that the ZFS OSD has very long transaction commit times
> for synchronous writes because it does not yet have support for the
> ZIL. Using buffered writes, or having very large O_DIRECT writes
> (e.g. 40MB or larger) and large RPCs (4MB, or up to 16MB in 2.9.0) to
> amortize the sync overhead may be beneficial if you really want to use
> O_DIRECT.
>
> Riccardo,
>
> The other potential issue is that you have 20 OSTs on a single OSS,
> which isn't going to have very good performance. Spreading the OSTs
> across multiple OSS nodes is going to improve your performance
> significantly when there are multiple clients writing, as there will
> be N times the OSS network bandwidth, N times the CPU, N times the
> RAM. It only makes sense to have 20 OSTs/OSS if your workload is only
> a single client and you want the maximum possible capacity for a given
> cost.
>
Hello Andreas,
each OST has a separate VDEV and separate zpool.
thank you
> Is each OST a separate VDEV and separate zpool, or are they a single
> zpool? Separate zpools have less overhead for maximum performance,
> but only one VDEV per zpool means that metadata ditto blocks are
> written twice per RAID-Z2 VDEV, which isn't very efficient. Having at
> least 3 VDEVs per zpool is better in this regard.
>
> Cheers, Andreas
>
> --
>
> Andreas Dilger
>
> Lustre Principal Architect
>
> Intel High Performance Data Division
>
> On 2016/10/14, 15:22, "John Bauer" <bauerj at iodoctors.com
> <mailto:bauerj at iodoctors.com>> wrote:
>
> Patrick
>
> I thought at one time there was an inode lock held for the duration of
> the direct I/O read or write. So that even if one had multiple
> application threads writing direct, only one was "in flight" at a
> time. Has that changed?
>
> John
>
> Sent from my iPhone
>
>
> On Oct 14, 2016, at 3:16 PM, Patrick Farrell <paf at cray.com
> <mailto:paf at cray.com>> wrote:
>
> Sorry, I phrased one thing wrong:
> I said "transferring to the network", but it's actually until it's
> received confirmation the data has been received successfully, I
> believe.
>
> In any case, only one I/O (per thread) can be outstanding at a
> time with direct I/O.
>
> ------------------------------------------------------------------------
>
> *From:*lustre-discuss <lustre-discuss-bounces at lists.lustre.org
> <mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of
> Patrick Farrell <paf at cray.com <mailto:paf at cray.com>>
> *Sent:* Friday, October 14, 2016 3:12:22 PM
> *To:* Riccardo Veraldi; lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> *Subject:* Re: [lustre-discuss] Lustre on ZFS pooer direct I/O
> performance
>
> Riccardo,
>
> While the difference is extreme, direct I/O write performance will
> always be poor. Direct I/O writes cannot be asynchronous, since
> they don't use the page cache. This means Lustre cannot return
> from one write (and start the next) until it has finished
> transferring the data to the network.
>
> This means you can only have one I/O in flight at a time. Good
> write performance from Lustre (or any network filesystem) depends
> on keeping a lot of data in flight at once.
>
> What sort of direct write performance were you hoping for? It will
> never match that 800 MB/s from one thread you see with buffered I/O.
>
> - Patrick
>
> ------------------------------------------------------------------------
>
> *From:*lustre-discuss <lustre-discuss-bounces at lists.lustre.org
> <mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of
> Riccardo Veraldi <Riccardo.Veraldi at cnaf.infn.it
> <mailto:Riccardo.Veraldi at cnaf.infn.it>>
> *Sent:* Friday, October 14, 2016 2:22:32 PM
> *To:* lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> *Subject:* [lustre-discuss] Lustre on ZFS pooer direct I/O
> performance
>
> Hello,
>
> I would like how may I improve the situation of my lustre cluster.
>
> I have 1 MDS and 1 OSS with 20 OST defined.
>
> Each OST is a 8x Disks RAIDZ2.
>
> A single process write performance is around 800MB/sec
>
> anyway if I force direct I/O, for example using oflag=direct in
> dd, the
> write performance drop as low as 8MB/sec
>
> with 1MB block size. And each write it's about 120ms latency.
>
> I used these ZFS settings
>
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
>
> i am quite worried for the low performance.
>
> Any hints or suggestions that may help me to improve the situation ?
>
>
> thank you
>
>
> Rick
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161015/ef062ecd/attachment-0001.htm>
More information about the lustre-discuss
mailing list