[lustre-discuss] Lustre on ZFS pooer direct I/O performance

Fri Oct 14 13:12:22 PDT 2016

Riccardo,

While the difference is extreme, direct I/O write performance will always be poor.  Direct I/O writes cannot be asynchronous, since they don't use the page cache.  This means Lustre cannot return from one write (and start the next) until it has finished transferring the data to the network.

This means you can only have one I/O in flight at a time.  Good write performance from Lustre (or any network filesystem) depends on keeping a lot of data in flight at once.

What sort of direct write performance were you hoping for?  It will never match that 800 MB/s from one thread you see with buffered I/O.

- Patrick

________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Riccardo Veraldi <Riccardo.Veraldi at cnaf.infn.it>
Sent: Friday, October 14, 2016 2:22:32 PM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Lustre on ZFS pooer direct I/O performance

Hello,

I would like how may I improve the situation of my lustre cluster.

I have 1 MDS and 1 OSS with 20 OST defined.

Each OST is a 8x Disks RAIDZ2.

A single process write performance is around 800MB/sec

anyway if I force direct I/O, for example using oflag=direct in dd, the
write performance drop as low as 8MB/sec

with 1MB block size. And each write it's about 120ms latency.

I used these ZFS settings

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

i am quite worried for the low performance.

Any hints or suggestions that may help me to improve the situation ?

thank you

Rick

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161014/9da5239f/attachment.htm>