[Lustre-discuss] 1.8.1.1 write slow performance :/

Sun Nov 8 12:52:01 PST 2009

-- 
Linux aleft 2.6.27.29-0.1_lustre.1.8.1.1-default #1 SMP
drbd 8.3.5-(api:88/proto:86-91)
pacemaker 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
Lustre 1.8.1.1-20091009080716-PRISTINE-2.6.27.29-0.1_lustre.1.8.1.1-default

Well, I'v setup everything using 64bit kernel, for now I got 
~4 TB of usable space with one lustre fs ost volume.

I just did some speed tests between client and filesystem server,
with dedicated GbitEthernet connection, I compared uploading via 
lustre-mounted share, and uploading to the same share, mounted
as loopback lustre client on filesystem server and reexported via nfs.

Results are quite sad, I have dreadfully slow write to remote lustrefs
directly, while write to lustre fs reexported via nfs is at least 10 times 
faster.. Client machine is Xeon 2.4 with 4GB RAM, and server machine is 
Xeon 3.0Gh with 8GB ram. I reviewed tuning chapter from lustre manual,
tuned rx of ethernet interface with ethtool. 
Lustre volumes (mgs,mdt,ost) are set up on UpToDate (synchronized) drbd
resources (synchronization already finished, via dedicated 1Gbit link, 
not the same interface used to communicate with lustre clients.)

I'd blame drbd for this, well, some cost is expected with drbd,
but nfs-reexported locally-mounte lfs volume obviously goes through
drbd stack too! DRBD resource is setup as backend storage device for
lustre, so actually it's not possible to write or read anything from/to
lustre with skipping drbd stack. Machines are load-free.

Seems, that with client-initiated write, the way

lustre client => lustre server => drbd resource "X"

is dramatically slower than

nfs clinet => nfs server => loopback lustre server => drbd resource "X".

And this is definitely not expected. Below are example transfer rates.
Any ideas for this? Is this, for example, some difference between nfs 
and lustre for in-the-middle gigabit switch performance ?

aleft:~# free -m
             total       used       free     shared    buffers     cached
Mem:          7987       3861       4126          0        102       3475
-/+ buffers/cache:        282       7705
Swap:         1906          0       1906
aleft:~# logout
Connection to master closed.
b02:~# free -m
             total       used       free     shared    buffers     cached
Mem:          4054       3908        145          0         43       1813
-/+ buffers/cache:       2051       2002
Swap:         7812          0       7812
b02:~# 

b02:~# ssh root at master
[..]
aleft:~# mount -t lustre
/dev/drbd0 on /mnt/mgs type lustre (rw,noauto)
/dev/drbd1 on /mnt/mdt type lustre (rw,noauto,_netdev)
/dev/drbd2 on /mnt/ost01 type lustre (rw,noauto,_netdev)
master at tcp0:/lfs00 on /mnt/lfs00 type lustre (rw,noauto,_netdev)
aleft:~# logout
b02:~# mount -t lustre
master at tcp0:/lfs00 on /mnt/lfs00 type lustre (rw,noauto,_netdev)
b02:~# mount -t nfs |grep master
master:/mnt/lfs00 on /mnt/nfs00 type nfs (rw,addr=192.168.0.100)
b02:~# 

Connection to master closed.
b02:~# ./100mb.sh 

lfs00-send
time dd if=/dev/zero of=/mnt/lfs00/testfile-b02 bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 22.3427 s, 4.7 MB/s

real    0m22.345s
user    0m0.100s
sys     0m3.760s

lfs00-get
time dd of=testfile-b02 if=/mnt/lfs00/testfile-b02 bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.987265 s, 106 MB/s

real    0m0.989s
user    0m0.040s
sys     0m0.880s
b02:~# ./100mb-nfs.sh 

nfs00-send
time dd if=/dev/zero of=/mnt/nfs00/testfile-b02 bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 1.05942 s, 99.0 MB/s

real    0m1.061s
user    0m0.028s
sys     0m0.252s

nfs00-get
time dd of=testfile-b02 if=/mnt/nfs00/testfile-b02 bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.576351 s, 182 MB/s

real    0m0.578s
user    0m0.016s
sys     0m0.556s
b02:~#