[Lustre-discuss] Slow read performance after ~700MB-2GB

Alexander Oltu Alexander.Oltu at uni.no
Wed Jun 9 07:44:57 PDT 2010


We are experiencing strange behavior on our Lustre setup. The problem
appears only while reading files. It is a Cray XT4 machine and this
problem is reproducible only on login nodes, while compute nodes are
fine.

To reproduce it we run:
cd /lustrefs
dd if=/dev/zero of=test.file bs=1024k count=5000
# drop cache
echo 1 > /proc/sys/vm/drop_caches
dd if=test.file of=/dev/null bs=1024k &
# check dd status:
kill -USR1 %1

After 700MB-2GB of reading speed drops to 1-2 MB/s, iowait will grow to
50% (one full CPU):
hexgrid:~ # vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 4016432 0 3349580 0 0 0 0 253 109 0 0 99 0 0
0 1 0 4016788 0 3349240 0 0 0 0 515 2322 9 6 38 47 0
0 1 0 4016836 0 3349512 0 0 0 0 505 90 0 0 50 50 0
0 1 0 4016860 0 3349716 0 0 0 0 505 82 0 0 50 50 0
0 1 0 4016860 0 3349784 0 0 0 0 505 85 0 0 50 50 0
0 1 0 4016856 0 3349920 0 0 0 0 505 85 0 0 50 50 0
0 1 0 4017048 0 3349852 0 0 0 0 505 86 0 0 50 50 0

hexgrid:~ # mpstat -P ALL 3
Linux 2.6.16.60-0.39_1.0102.4784.2.2.48B-ss (hexgrid.bccs.uib.no)
05/11/2010

03:35:12 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:35:15 PM all 0.00 0.00 0.00 49.92 0.00 0.00 0.00 50.08 504.67
03:35:15 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00
03:35:15 PM 1 0.00 0.00 0.00 99.67 0.00 0.00 0.00 0.00 504.67

03:35:15 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:35:18 PM all 0.00 0.00 0.00 50.00 0.00 0.00 0.00 50.00 503.99
03:35:18 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00
03:35:18 PM 1 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 503.99

I am not sure where to look into and if it is lustre or HW problem? No
messages in dmesg.

Thanks,
Alex.



More information about the lustre-discuss mailing list