Yevheniy Demchenko zheka at uvt.cz
Thu Apr 5 13:40:04 PDT 2012

We are experimenting with a new lustre-2.2.0 and surprisingly not
getting any statfs performance improvements. It's generally as poor as
in previous versions. Also, statahead does not seem to make any
difference to statfs performance.
Test suite:
3 identical machines (node21 for mgsmdt, node22 for ostoss and node23 as
a client).
Every machine has 2 intel L5420 at 2.5 Ghz cpus and 8Gb ram. All disks are
fairly fast kingston ssd.
Nodes are connected via QDR infiniband (ConnectX2 mellanox + voltaire
On node21 and node22:

#rpm -qa | grep lustre

On node23:
#rpm -qa | grep lustre

#cat /etc/modprobe.d/lustre.conf
 options lnet networks="o2ib0(ibbond0)"

Default test lustre filesystem was created in following steps:
On node21:
#mkfs.lustre --reformat --fsname=lustrewt --mgs --mdt /dev/vg1/mgsmdt
#mount -t lustre /dev/vg1/mgsmdt /mgsmdt_mount/
On node22:
#mkfs.lustre --reformat --ost --fsname=lustrewt
--mgsnode= at o2ib0 /dev/vg2/ostoss
#mount -t lustre /dev/vg2/ostoss /ostoss_mount

Simple ls -l test for a 100000 files directory was done on the node23
with statahead on and off:
On node23:
#mount -t lustre at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime
#mkdir /mnt/temp/a
#cd /mnt/temp/a
#for i in $(seq 1 100000) ; do echo $i ; dd if=/dev/zero of=./$i bs=4096
count=1 ; done
#umount /mnt/temp

-----statahead off-----
On node23:
#mount -t lustre at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime
#echo 0 > /proc/fs/lustre/llite/lustrewt-ffff88021e427c00/statahead_max
#cd /mnt/temp/a

#time ls -l
real    0m52.751s
user    0m1.153s
sys     0m20.101s

#time ls -l
real    0m21.280s
user    0m1.086s
sys     0m11.973s
All subsequent runs complete in virtually the same time (21-24s).

#umount /mnt/temp

-----statahead on-----
#mount -t lustre at o2ib:/lustrewt /mnt/temp -o noatime,nodiratime
#cat /proc/fs/lustre/llite/lustrewt-ffff88021e427c00/statahead_max

#time ls -l
real    0m43.846s
user    0m1.242s
sys     0m20.444s

#time ls -l
real    0m24.000s
user    0m1.104s
sys     0m14.125s
All subsequent runs complete in virtually the same time (21-24s).

#strace -r ls -l
     0.000041 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000238 lgetxattr("77193", "security.selinux", 0x129d800, 255) =
-1 ENODATA (No data available)
     0.000179 lstat("77193", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000096 lgetxattr("77193", "system.posix_acl_access", 0x0, 0) = -1
ENODATA (No data available)
     0.000045 lgetxattr("77193", "system.posix_acl_default", 0x0, 0) =
-1 ENODATA (No data available)
     0.000045 lstat("80570", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000232 lgetxattr("80570", "security.selinux", 0x129d820, 255) =
-1 ENODATA (No data available)
     0.000209 lstat("80570", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000081 lgetxattr("80570", "system.posix_acl_access", 0x0, 0) = -1
ENODATA (No data available)
     0.000041 lgetxattr("80570", "system.posix_acl_default", 0x0, 0) =
-1 ENODATA (No data available)
Selinux is disabled on all nodes. Remounting fs with -o noacl on all
nodes does not make any difference, ls -l takes 21-24 secs.
After remounting with -o noacl:
#strace -r ls -l
     0.000043 lstat("13382", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000103 lgetxattr("13382", "security.selinux", 0xe8cfc0, 255) = -1
ENODATA (No data available)
     0.000141 lstat("13382", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000093 lgetxattr("13382", "system.posix_acl_access", 0x0, 0) = -1
EOPNOTSUPP (Operation not supported)
     0.000048 lstat("3014", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000121 lgetxattr("3014", "security.selinux", 0xe8cfe0, 255) = -1
ENODATA (No data available)
     0.000151 lstat("3014", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
     0.000091 lgetxattr("3014", "system.posix_acl_access", 0x0, 0) = -1
EOPNOTSUPP (Operation not supported)

I tried to improve performance and played with many lustre parameters,
but was never able to beat a magical 20 seconds for ls -l to take. It
seems that lustre statfs just doesn't want to do more than 5000
files/second for a single client.
I'd grateful if someone could share his real-life ls -l performance
results on lustre filesystem. It is possible, that i'm completely
missing some obvious setting, so if anybody has any idea please let me know.


Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o. 

