[Lustre-discuss] Slow read performance across OSSes

Sat Oct 17 10:53:17 PDT 2009

On 14-Oct-09, at 14:15, James Robnett wrote:
>  After reading through my first post I felt some clarification was
> probably warranted.
>
>  In this test setup there are two OSS, call them OSS-1 and OSS-2,
> each has an OST, call them OSS-1-A, OSS-1-B and OSS-2-A, OSS-2-B.
>
>  The MDS, OSSes and client all have 1Gbit ethernet connections.
>
>  The following table illustrates the data rates I see in MB/s.
>
> OST(s)                                 Read    Write
> OSS-1-A                                 113      95
> OSS-1-B                                 112      93
> OSS-1-A OSS-1-B                         112      98
> OSS-2-A                                 105      93
> OSS-2-B                                 115      94
> OSS-2-A OSS-2-B                         115      98
> OSS-1-B OSS-2-A                     ---> 42     113
> OSS-1-A OSS-2-B                     ---> 42     114
> OSS-1-A OSS-1-B OSS-2-A OSS-2-B     ---> 46     114

You're sure that there isn't some other strange effect here, like you
are only measuring the speed of a single iozone thread or similar?

>  I can envision that there would be more re-assembly overhead on
> the client in the case of 2 OSSes(1) but I'm surprised it's that high.
>
>  Is this an expected result ?
>
>  If it's unexpected is there a common misconfiguration or client
> short coming that causes it to be slower when reading from multiple
> OSSes?

This is definitely NOT expected, and I'm puzzled as to why this might  
be.

>  Is there some command I could run or data I could provide that would
> help identify the issue ?  I'm fairly new to Lustre so I'm just as
> likely to flood noise as signal if I just randomly appended data
> beyond raw rates.

You could check /proc/fs/lustre/obdfilter/*/brw_stats on the  
respective OSTs
to see if the client is not assembling the RPCs very well for some  
reason.
Alternately, it might be that you have configured the disk storage of  
OSS-1
and OSS-2 to compete (e.g. different partitions sharing the same disks).

> 1) I'm assuming in the case of a single OSS with 2 OSTs the OSS
> presents the client with a single stream.  If assembly of two data
> streams is required on the client in both the single and dual OSS
> (both with 2 OSTs) cases then I'm even more confused about those
> results.

No, the client needs to assemble the OST objects itself, regardless of
whether the OSTs are on the same OSS or not.  The file should be striped
over all of the OSTs involved in the test.

> James Robnett wrote:
>>  The nodes are a bit cobbled together from what I had handy.
>>
>> One MDS: Dual quad-core 2.5GHz nehalem 8GB RAM  E1000 gigabit NIC
>>         MDT is just a partition on a 1TB SAS Seagate
>> Two OSS: Single dual core 2.8GHz Xeon, 4GB RAM single gigabit NIC
>>         Dual 3ware 9550SX cards with 7+1 RAID 5 across 400GB WD SATA
>>         drives.
>> Two OST/OSS: 2TB. Configured as LVM.  1 and 4MB stripe size tried.
>> Client:  Dual quad-core 2.5 GHz Xeon, 8GB RAM single gigabit NIC
>> Network:  Dedicated Cisco 2960g Gigabit switch

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.