[Lustre-discuss] Slow read performance across OSSes

Tue Oct 13 08:16:35 PDT 2009

   I've put together a small test Lustre system which is giving
confusing (at least to me) results.  All nodes are running fully
patched 64bit RHEL 5.3 with the premade Lustre 1.8.1 x86_64 RPMs.

   The nodes are a bit cobbled together from what I had handy.

One MDS: 8 core 2.5 GHz nihalem 8GB RAM single E1000 gigabit NIC
          MDT is just a partition on a 1TB SAS Seagate
Two OSS: Dual core 2.8GHz Xeon, 4GB RAM single E1000 gigabit NIC
          Dual 3ware 9550SX cards with 7+1 RAID 5 across 400GB WD SATA
          drives.
Two OST/OSS: 2TB. Configured as LVM.  1 and 4MB stripe size tried.
Client:  8 core 2.5 GHz Xeon, 8GB ram single Broadcaom gigabit NIC
Network:  Dedicated Cisco 2960g Gigabit switch

   This gives 2 OSSes, 4 OSTs of 2TB each for a total of 8TB.  I've
tried 1MB and 4MB stripes.

   Using Bonnie++ 1.03b (-f -s24g) from the client I see decent numbers
when  reading/writing to any single OST (94 and 112 MB/s write/read).  I
see slightly better numbers using 2 OST's on the same OSS (98 and
115MB/s write/read).

   When I use any 2 OSTs across two OSSes or all 4 OSTs I see a
distinct fall off in read rates.  In that case I get full 115MB/s
writes but only 40MB/s reads.  This holds true for striping that uses
any combination of OSTs which utilize both OSSes.

   All the data rates are about what I'd expect given the subsystems
and gigabit ethernet but those very slow reads confuse me.  I expect
slightly slower (say 80-90 MB/s) reads due to buffer issues but
not 40.

   With iostat I see relatively sustained read rates on each OSTs
volume as opposed to full reads, wait, full reads, wait which seems to 
imply the client is the one setting the pace but I'm confused why
the client is so slow reassembling replies from two streams from
2 OSS's and not 2 streams from one OSS.

   I've tried 1MB and 4MB stripe sizes, I've tried increasing the
RX ring on OSSes to 4096, I've tried disabling checksums.  Not
surprisingly nothing seemed to have any effect since each OSS can
easily handle the client requests on its own.

   I have *not* applied the patches that address the potential corruption
issue in 1.8.x.  I saw no evidence they really applied in this case.

   I've searched through this list but haven't seen anything that seems
equivalent.  I feel I must have missed something simple on the client
side but am at my wits end what that is.

   Thanks in advance for any insight as to what I'm missing.

James Robnett
NRAO/NM