[Lustre-devel] discontiguous kiov pages

Eric Barton eeb at whamcloud.com
Thu Jun 9 07:38:11 PDT 2011


It seems to me that Jay's suggestion to put the niobufs into
separate RPCs is a good one - particularly since writing the 2nd
niobuf should only be attempted after the first to ensure the
file size is set correctly (BTW this means the 2nd RPC cannot
be posted until the first has completed - otherwise the RPCs
could get re-ordered in the network or at the server).  

However it would be nice to aggregate small, possibly unrelated
I/Os and if/when we do that this issue will crop up again.  If
we stick with the rule that MDs cannot have internal partial pages,
we're forced to use 1 MD for each niobuf.  Putting several of these
in 1 RPC requires separate matchbits for each niobuf to ensure
correct match of source and sink buffers independent of races in
the network.  This must be more efficient than scheduling multiple 
concurrent RPCs each with 1 niobuf, but by how much isn't clear,
since the bulk transfer phases of both schemes should cause identical
network traffic.  

So aggregation will probably require LNET/LND support for MDs with
internal partial pages.  At a guess, this will have strict limits
for some LNDs and probably can't be done without reducing the total
number of fragments in such messages.  Also, the interaction with
LNET routers needs to be considered since mismatched RDMA descriptors
can potentially double the number of actual RDMA fragments on the
wire.

          Cheers,
                   Eric






More information about the lustre-devel mailing list