[Lustre-discuss] write RPC & congestion

Thu Dec 23 01:52:30 PST 2010

On Dec 22, 2010, at 05:51 , Oleg Drokin wrote:

> Hello!
> 
hi all,

> I guess I am a little bit late to the party, but I was just reading comments in bug 16900 and have this question I really need to ask.
> 
> On Aug 23, 2010, at 10:58 PM, Jeremy Filizetti wrote:
>> The larger RPCs from bug 16900 offered some significant performance when working over the WAN.  Our use case involves a few clients who need fast access rather then 100s or 1000s.  The included PDF shows iozone performance over the WAN in 10 ms RTT increments up to 200ms for a single Lustre client and a small Lustre setup (1 MDS, 2 OSS, 6 OSTs).  This test was with a SDR Infiniband WAN connection using Obsidian Longbows to simulate delay.  I'm not 100% sure the value used is correct for the concurrent_sends.
>> 
>> So even though this isn't geared towards most Lustre users, I think the larger RPCs is pretty useful.  Plenty of people at LUG2010 mentioned using Lustre over the WAN in some way.
> 
> So are you sure you got your benefit from the larger RPC size as opposed to just having 4x more data on the wire? There is another way to increase the amount of data on the wire without large RPCs, you can increase the number or RPCs in flight to OSTs from current default of 8 to say 32 (/proc/fs/lustre/osc/*/max_rpcs_in_flight).
> 
> I really wonder how the results would compare to the 4M RPCs results if you still have the capability to test it.
> 
I agree with Oleg this is better approach also from another point of view. While Lustre tries to form full 1M or 4M (whatever) IO rpcs this is not always possible. One of such a cases is IO to many small files. There is just no way to pack into one IO rpc pages that belong to multiple files. This causes lots of small IO that definitely will under-load the network.

While tuning max_rpc_in_flight you may want to control that network is not overloaded. This can be done with checking "threads_started" in service on server. This is number of threads that currently used for handling rpc on server for that service. If it stops growing with increasing max_rpc_in_flight - your network becoming bottleneck.

Example:

cat /proc/fs/lustre/ost/OSS/ost_io/threads_started

Thanks.

> Thanks.
> 
> Bye,
>    Oleg
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

--
umka

______________________________________________________________________
This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it.

Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses.

Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.

The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan.
______________________________________________________________________