[Lustre-discuss] poor lustre wan performance
Dardo D Kleiner - CONTRACTOR
dkleiner at cmf.nrl.navy.mil
Tue Nov 10 05:02:03 PST 2009
(Cross posting here in the hopes of finding a wider audience)
SLES11 x86_64/Lustre 1.8.1.1
options ko2iblnd map_on_demand=31 peer_credits=128 credits=256 concurrent_sends=256 \
ntx=512 fmr_pool_size=512 fmr_flush_trigger=384 fmr_cache=1
Local write performance is ok, 300-400 MB/sec - however with any substantial latency
performance tanks (10-30 MB/sec). Only thing I can see that's relevant is that although
the rpc sizes are good, the number of write rpcs in flight never goes above 1, e.g.
with 30 ms latency:
snapshot_time: 1257795116.522124 (secs.usecs)
read RPCs in flight: 0
write RPCs in flight: 1
pending write pages: 256
pending read pages: 0
read write
pages per rpc rpcs % cum % | rpcs % cum %
1: 0 0 0 | 0 0 0
2: 0 0 0 | 0 0 0
4: 0 0 0 | 0 0 0
8: 0 0 0 | 0 0 0
16: 0 0 0 | 0 0 0
32: 0 0 0 | 0 0 0
64: 0 0 0 | 0 0 0
128: 0 0 0 | 0 0 0
256: 0 0 0 | 100 100 100
read write
rpcs in flight rpcs % cum % | rpcs % cum %
0: 0 0 0 | 100 100 100
read write
offset rpcs % cum % | rpcs % cum %
0: 0 0 0 | 100 100 100
At this point it clearly doesn't matter if I mess with max_rpcs_in_flight which used
to be a way to mitigate the high BDP.
Are there new parameters and/or tunings for ko2iblnd we're supposed to be using? Did
something change with 1.8.1.1 in this regard - I'm trying to determine if it was our
move to SLES11 or something else? Our operational deployment is not yet at this
latest version but are wary to upgrade since I've indicated I'm having problems.
Any suggestions greatly appreciated...
- Dardo
More information about the lustre-discuss
mailing list