[lustre-discuss] Lustre 2.10.3 on ZFS - slow read performance
Alex Vodeyko
alex.vodeyko at gmail.com
Tue Mar 27 12:55:23 PDT 2018
Hi,
I'm setting up the new lustre test setup with the following hw config:
- 2x servers (dual E5-2650v3, 128GB RAM), one MGS/MDS, one OSS
- 1x HGST 4U60G2 JBOD with 60x 10TB HUH721010AL5204 drives (4k
physical, 512 logical sector size), connected to OSS using lsi 9300-8e
Lustre 2.10.3 servers/clients (centos 7.4), zfs - 0.7.5 and also 0.7.7
Initially I planned to use 2 zpools with three 8+2 vdevs or 1 zpool
with six 8+2 vdevs.
I created zpool with:
"zpool create -o multihost=on -O canmount=off -O recordsize=1024K
-O compression=off -o cachefile=none -o ashift=12 l2oss1 raidz2
d0..d9 raidz2 d10..d19 raidz2 d20..29"
Benchmarking shown poor read performance:
1) obdfilter-survey
Obdfilter-survey for case=disk from oss1
ost 1 sz 163840000K rsz 1024K obj 1 thr 1 write 2092.43 [
785.76, 3200.18] rewrite 2154.08 [ 800.56, 3033.77] read 525.03 [
70.00, 2048.87]
2) IOR single thread (./ior -a POSIX -F -rw -e -b 128g -t 1m -i 1 -o /l1/tmp/
ccess bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s)
close(s) total(s) iter
------ --------- ---------- --------- -------- --------
-------- -------- ----
write 679.45 134217728 1024.00 0.001474 192.91
0.000845 192.91 0
read 195.23 134217728 1024.00 0.001215 671.38
0.000871 671.38 0
remove - - - - - -
20.52 0
3) iozone also shown ~2GB/s writes and 0.8GB/s reads
So reads are 3-5 times lower then writes.
Then I've tried other zpool configs: with two 8+2 raidz2 vdevs, four
13+2 raidz2 vdevs, six 8+2 raidz2 vdevs etc - the same problem -
reads are 5 times slower then writes. The only exception is some small
vdev (f.e. single 8+2 raidz2) - then reads are closer to writes.
I've tried zfs-0.7.5/zfs-0.7.7 - did not help.
Tried also simple striped vdevs with just 15 drives - didnot help
(Obdfilter-survey for case=disk from oss1
ost 1 sz 167936000K rsz 1024K obj 1 thr 1 write 2982.54
[2586.11, 3079.11] rewrite 2875.67 [1416.86, 3203.13] read 159.22 [
61.99, 481.96])
Example "zpool iostat -v 5" during reads:
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
l2oss1 976G 135T 184 15 183M 60.8K
sdb 65.6G 9.00T 11 0 11.2M 2.40K
sdc 64.9G 9.00T 12 1 12.4M 6.40K
sdd 65.0G 9.00T 12 0 13.0M 3.20K
sde 64.8G 9.00T 12 0 12.2M 3.20K
sdf 64.7G 9.00T 12 1 12.4M 4.80K
sdg 64.8G 9.00T 12 0 12.8M 2.40K
sdh 66.1G 9.00T 12 0 12.6M 2.40K
sdi 64.6G 9.00T 12 1 11.8M 5.60K
sdj 64.6G 9.00T 11 0 11.2M 4.00K
sdk 64.6G 9.00T 12 0 12.8M 2.40K
sdaa 66.1G 9.00T 11 0 11.4M 4.00K
sdab 65.0G 9.00T 11 1 11.8M 5.60K
sdac 64.8G 9.00T 12 1 12.4M 6.40K
sdad 65.6G 9.00T 12 0 12.6M 4.00K
sdae 65.3G 9.00T 12 0 12.6M 4.00K
---------- ----- ----- ----- ----- ----- -----
"top" shows only 8 "z_rd_int" processes - and only one "z_rd_int"
running (while there were 32 running z_wr_iss processes during write,
wait io = 2-4% in both reads/writes)
Tried with prefetch_disable=1 - didnot help.
Tried vdevs from different drives (even on separate expanders) - didnot help.
sgpdd-survey shown all drives in range of 220-240GB/s.
Left only one sas channel to jbod, no multipath, no vdev aliases - didnot help.
So I'm completely stuck.
Could you please help?
Thank you very much in advance,
Alex
More information about the lustre-discuss
mailing list