[lustre-discuss] [EXTERNAL] Re: Fwd: RPCs in Flight are more than the max_rpcs_in_flight value
Md Hasanur Rashid
mrashid2 at uncc.edu
Sat Oct 16 15:10:58 PDT 2021
Hi Everyone,
Thanks for getting back to me, Andreas and Patrick. The reason for me to
experiment with max_rpcs_in_flight being 1 is that I have observed
significant improvement over the default value when the workload is the
following one:
define fileset name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1"
define process name="readerP",instances=2 {
thread name="readerT",instances=4 {
flowop openfile name="openOP",filesetname="testF"
flowop writewholefile name="writeOP",iters=4,filesetname="testF"
flowop readwholefile name="readOP",iters=1,filesetname="testF"
flowop closefile name="closeOP"
}
}
create files
system "sync"
system "echo 3 > /proc/sys/vm/drop_caches"
run 60
The performance when max_rpcs_in_flight has default value looks like the
following:
*max_rpcs_in_flight=8, max_pages_per_rpc=256, max_dirty_mb=2000*
64.164: Per-Operation Breakdown
closeOP 5612ops 94ops/s 0.0mb/s 0.581ms/op [0.278ms
- 1.966ms]
readOP 5612ops 94ops/s 5985.6mb/s 19.335ms/op
[14.957ms - 83.853ms]
writeOP 22462ops 374ops/s 5989.3mb/s 16.248ms/op
[5.683ms - 626.451ms]
openOP 5620ops 94ops/s 0.0mb/s 0.539ms/op [0.313ms
- 27.266ms]
64.164: IO Summary: 39306 ops 655.039 ops/s 94/374 rd/wr 11974.9mb/s
12.206ms/op
If we change the max_rpcs_in_flight value to 1, the performance is
following:
*max_rpcs_in_flight=1, max_pages_per_rpc=256, max_dirty_mb=2000*
64.048: Per-Operation Breakdown
closeOP 8631ops 144ops/s 0.0mb/s 0.593ms/op [0.273ms
- 4.810ms]
readOP 8631ops 144ops/s 9205.6mb/s 20.187ms/op
[16.292ms - 37.513ms]
writeOP 34545ops 576ops/s 9211.2mb/s 8.561ms/op
[5.860ms - 55.334ms]
openOP 8639ops 144ops/s 0.0mb/s 0.552ms/op [0.293ms
- 10.132ms]
64.048: IO Summary: 60446 ops 1007.344 ops/s 144/576 rd/wr 18416.8mb/s
7.939ms/op
As you can see from above the performance got improved when the value got
changed to 1. Unfortunately, I am still not sure why such improvement is
being observed. I ran the workload from a single client. The cluster setup
looks like the following (5 server machines, 1 for MGS+MDS, and the other 4
for OSSs where each one has 2 OSTs):
UUID bytes Used Available Use% Mounted on
hasanfs-MDT0000_UUID 144.8G 78.1M 134.8G 0%
/mnt/hasanfs[MDT:0]
hasanfs-OST0000_UUID 197.3G 1.1G 185.4G 1%
/mnt/hasanfs[OST:0]
hasanfs-OST0001_UUID 197.3G 1.2G 185.3G 1%
/mnt/hasanfs[OST:1]
hasanfs-OST0002_UUID 197.3G 1.2G 185.3G 1%
/mnt/hasanfs[OST:2]
hasanfs-OST0003_UUID 197.3G 1.2G 185.2G 1%
/mnt/hasanfs[OST:3]
hasanfs-OST0004_UUID 197.3G 1.2G 185.3G 1%
/mnt/hasanfs[OST:4]
hasanfs-OST0005_UUID 197.3G 1.2G 185.4G 1%
/mnt/hasanfs[OST:5]
hasanfs-OST0006_UUID 197.3G 1.2G 185.3G 1%
/mnt/hasanfs[OST:6]
hasanfs-OST0007_UUID 197.3G 1.2G 185.4G 1%
/mnt/hasanfs[OST:7]
filesystem summary: 1.5T 9.6G 1.4T 1% /mnt/hasanfs
It would be really helpful for me and highly appreciated if you can share
your intuitions or reasons regarding why such improvement may have been
observed.
Thanks,
Md. Hasanur Rashid
On Thu, Oct 7, 2021 at 5:45 PM Andreas Dilger <adilger at whamcloud.com> wrote:
> [*Caution*: Email from External Sender. Do not click or open links or
> attachments unless you know this sender.]
> On Oct 7, 2021, at 13:19, Md Hasanur Rashid via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
>
> Hello Everyone,
>
> I am running the Filebench benchmark in my Lustre cluster. I set the
> max_rpcs_in_flight value to be 1. Before executing and after executing, I
> verified that the value of max_rpcs_in_flight is indeed 1. However, when I
> check the rpc_stats, the stat shows a much higher value for RPCs in flight.
>
>
> If by "much higher than 1" you mean "2", then yes it appears there are
> mostly (95%) 2 RPCs being processed concurrently on this OST.
>
> That might happen if you have 2 clients/mountpoints writing to the same
> OST, it might be an off-by-one logic error allowing an extra RPC in flight,
> it might be intentional for some reason (e.g. to avoid deadlock, memory
> pressure, etc), or it might be accounting error in the statistics (e.g.
> counting the next RPC to be sent before the first one is marked finished).
>
> Following is the value shown for one OSC just for reference:
>
> osc.hasanfs-OST0000-osc-ffff882fcf777000.rpc_stats=
> snapshot_time: 1632483604.967291 (secs.usecs)
> read RPCs in flight: 0
> write RPCs in flight: 0
> pending write pages: 0
> pending read pages: 0
>
> read write
> pages per rpc rpcs % cum % | rpcs % cum %
> 1: 1 100 100 | 0 0 0
> 2: 0 0 100 | 0 0 0
> 4: 0 0 100 | 0 0 0
> 8: 0 0 100 | 0 0 0
> 16: 0 0 100 | 0 0 0
> 32: 0 0 100 | 0 0 0
> 64: 0 0 100 | 0 0 0
> 128: 0 0 100 | 0 0 0
> 256: 0 0 100 | 9508 100 100
>
> read write
> rpcs in flight rpcs % cum % | rpcs % cum %
> 0: 0 0 0 | 0 0 0
> 1: 1 100 100 | 10 0 0
> 2: 0 0 100 | 9033 95 95
> 3: 0 0 100 | 465 4 100
>
> read write
> offset rpcs % cum % | rpcs % cum %
> 0: 1 100 100 | 725 7 7
> 1: 0 0 100 | 0 0 7
> 2: 0 0 100 | 0 0 7
> 4: 0 0 100 | 0 0 7
> 8: 0 0 100 | 0 0 7
> 16: 0 0 100 | 0 0 7
> 32: 0 0 100 | 0 0 7
> 64: 0 0 100 | 0 0 7
> 128: 0 0 100 | 0 0 7
> 256: 0 0 100 | 718 7 15
> 512: 0 0 100 | 1386 14 29
> 1024: 0 0 100 | 2205 23 52
> 2048: 0 0 100 | 1429 15 67
> 4096: 0 0 100 | 1103 11 79
> 8192: 0 0 100 | 1942 20 100
>
> Can anyone please explain to me why the RPCs in flight shown in the
> rpc_stats could be higher than the max_rpcs_in_flight?
>
>
> I do see a similar behavior with the statistics of my home system, which
> has the default osc.*.max_rpcs_in_flight=8, but shows many cases of 9 RPCs
> in flight in the statistics for both read and write, and in a few cases 10
> or 11:
>
> read write
> rpcs in flight rpcs % cum % | rpcs % cum %
> 1: 121 2 2 | 27831 93 93
> 2: 23 0 3 | 108 0 93
> 3: 22 0 3 | 19 0 93
> 4: 24 0 4 | 15 0 93
> 5: 19 0 5 | 10 0 93
> 6: 26 0 5 | 13 0 93
> 7: 176 4 9 | 39 0 93
> 8: 933 22 32 | 75 0 94
> 9: 2802 67 99 | 1207 4 98
> 10: 10 0 100 | 543 1 99
> 11: 0 0 100 | 1 0 100
>
>
> The good news is that Lustre is open source, so you can look into the
> lustre/osc code to see why this is happening. The limit is set by
> cli->cl_max_rpcs_in_flight, and the stats are accounted by
> cli->cl_write_rpc_hist.
>
> Out of curiosity, you don't say _why_ this off-by-one error is of
> interest? Definitely it seems like a bug that could be fixed, but it
> doesn't seem too critical to correct functionality.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211016/4fd594fc/attachment-0001.html>
More information about the lustre-discuss
mailing list