[Lustre-discuss] write RPC & congestion

Jeremy Filizetti jeremy.filizetti at gmail.com
Tue Dec 21 21:43:38 PST 2010


In the attachment I created that Andreas posted at
https://bugzilla.lustre.org/attachment.cgi?id=31423 if you look at graph 1
and 2 they are both using larger than default max_rpcs_in_flight.  I believe
the data without the patch from bug 16900 had max_rpcs_in_flight=42.  For
the data with the patch from 16900 used max_rpcs_in_flight=32.  So the short
answer is we are already increasing max_rpcs_in_flight for all of that data
(which is needed for good performance at higher latencies).

My understanding of what the real benefit is from the larger RPC patch is
that we are not having to face 12 round-trip-times to read 4 MB (4 - 1 MB
bulk RPCs) instead I think we have 3.  Although I've never traced through to
see this is actually what is happening.  But from what I read about the
patch it sends 4 memory descriptors with a single bulk request.

What isn't quite clear to me is why Lustre takes 3 RTT for a read and 2 for
a write.  I think I understand the write having to communicate once with the
server because preallocating buffers for all clients would possible be a
waste of resources.  But for reading it seems logical (from the RDMA stand
point) that the memory buffer could be pre-registered and send to the server
and the server would respond back with the contents for that buffer for a
read which would be 1 RTT.

I don't have everything setup right now in our test environment but with a
little effort I could setup a similar test if your wondering about something
specific.

Jeremy

So are you sure you got your benefit from the larger RPC size as opposed to
> just having 4x more data on the wire? There is another way to increase the
> amount of data on the wire without large RPCs, you can increase the number
> or RPCs in flight to OSTs from current default of 8 to say 32
> (/proc/fs/lustre/osc/*/max_rpcs_in_flight).
>
> I really wonder how the results would compare to the 4M RPCs results if you
> still have the capability to test it.
>
> Thanks.
>
> Bye,
>     Oleg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101222/c1ff17f5/attachment.htm>


More information about the lustre-discuss mailing list