[Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

Isaac Huang He.Huang at Sun.COM
Thu Feb 25 12:56:41 PST 2010


On Mon, Feb 22, 2010 at 03:22:52AM -0800, Vipul Pandya wrote:
> Hello Issac,

Hi Vipul,

> ......
> I lowered the map_on_demand value to 16 and now it works fine.
> 
> However, I had once concern, whether lowering down this map_on_demand
> value would impact the performance of Lustre or not?

For iWARP, you probably have no alternative. I remembered that there's
a restriction somewhere in the iWARP stack that limits the size of SQs
(which was why the rdma_create_qp errors happened), and lowering
map_on_demand is the only way to reduce Lustre SQ length.

For infiniband, lowering map_on_demand essentially reduces the # of
RDMA WQE needed for each Lustre bulk data movement, at the cost of
memory registration/deregistration at most per bulk transfer; without
map_on_demand the o2iblnd uses a static MR so there's no memory
registration cost. There could a point in the # of frags of the bulk
buffer, where the cost of handling RDMA WQEs (which usually equals
the # of frags) exceeds the cost of MR, and that's what you should
set map_on_demand to. However, since both costs are mostly determined
by HCA hardware/firmware implementation, there's no one good setting
for all interconnects, and you can only find it by testing. The LNet
selftest is a useful tool for running such tests:
http://manual.lustre.org/manual/LustreManual16_HTML/LustreIOKit.html#50610302_36273

Hope this helps,
Isaac



More information about the lustre-discuss mailing list