[lustre-discuss] [EXTERNAL] Re: Fwd: RPCs in Flight are more than the max_rpcs_in_flight value
Md Hasanur Rashid
mrashid2 at uncc.edu
Sun Oct 17 19:28:34 PDT 2021
All my OSS's are mounted on SSD. So, I think that wouldn't be the reason
behind such improvements.
Thanks,
Hasan
On Sun, Oct 17, 2021 at 3:54 AM Andreas Dilger <adilger at whamcloud.com>
wrote:
> It seems likely if you are using spinning disks (HDD) that using
> max_rpcs_in_flight=1 will reduce the seeking on the OSTs, especially with
> large RPC sizes (16MB).
>
> Cheers, Andreas
>
> On Oct 16, 2021, at 16:11, Md Hasanur Rashid <mrashid2 at uncc.edu> wrote:
>
>
> Hi Everyone,
>
> Thanks for getting back to me, Andreas and Patrick. The reason for me to
> experiment with max_rpcs_in_flight being 1 is that I have observed
> significant improvement over the default value when the workload is the
> following one:
>
> define fileset name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1"
>
> define process name="readerP",instances=2 {
> thread name="readerT",instances=4 {
> flowop openfile name="openOP",filesetname="testF"
> flowop writewholefile name="writeOP",iters=4,filesetname="testF"
> flowop readwholefile name="readOP",iters=1,filesetname="testF"
> flowop closefile name="closeOP"
> }
> }
>
> create files
> system "sync"
> system "echo 3 > /proc/sys/vm/drop_caches"
>
> run 60
>
> The performance when max_rpcs_in_flight has default value looks like the
> following:
>
> *max_rpcs_in_flight=8, max_pages_per_rpc=256, max_dirty_mb=2000*
>
>
> 64.164: Per-Operation Breakdown
>
> closeOP 5612ops 94ops/s 0.0mb/s 0.581ms/op
> [0.278ms - 1.966ms]
>
> readOP 5612ops 94ops/s 5985.6mb/s 19.335ms/op
> [14.957ms - 83.853ms]
>
> writeOP 22462ops 374ops/s 5989.3mb/s 16.248ms/op
> [5.683ms - 626.451ms]
>
> openOP 5620ops 94ops/s 0.0mb/s 0.539ms/op
> [0.313ms - 27.266ms]
>
> 64.164: IO Summary: 39306 ops 655.039 ops/s 94/374 rd/wr 11974.9mb/s
> 12.206ms/op
>
> If we change the max_rpcs_in_flight value to 1, the performance is
> following:
>
> *max_rpcs_in_flight=1, max_pages_per_rpc=256, max_dirty_mb=2000*
>
> 64.048: Per-Operation Breakdown
>
> closeOP 8631ops 144ops/s 0.0mb/s 0.593ms/op
> [0.273ms - 4.810ms]
>
> readOP 8631ops 144ops/s 9205.6mb/s 20.187ms/op
> [16.292ms - 37.513ms]
>
> writeOP 34545ops 576ops/s 9211.2mb/s 8.561ms/op
> [5.860ms - 55.334ms]
>
> openOP 8639ops 144ops/s 0.0mb/s 0.552ms/op
> [0.293ms - 10.132ms]
>
> 64.048: IO Summary: 60446 ops 1007.344 ops/s 144/576 rd/wr 18416.8mb/s
> 7.939ms/op
>
> As you can see from above the performance got improved when the value got
> changed to 1. Unfortunately, I am still not sure why such improvement is
> being observed. I ran the workload from a single client. The cluster setup
> looks like the following (5 server machines, 1 for MGS+MDS, and the other 4
> for OSSs where each one has 2 OSTs):
>
> UUID bytes Used Available Use% Mounted on
>
> hasanfs-MDT0000_UUID 144.8G 78.1M 134.8G 0%
> /mnt/hasanfs[MDT:0]
>
> hasanfs-OST0000_UUID 197.3G 1.1G 185.4G 1%
> /mnt/hasanfs[OST:0]
>
> hasanfs-OST0001_UUID 197.3G 1.2G 185.3G 1%
> /mnt/hasanfs[OST:1]
>
> hasanfs-OST0002_UUID 197.3G 1.2G 185.3G 1%
> /mnt/hasanfs[OST:2]
>
> hasanfs-OST0003_UUID 197.3G 1.2G 185.2G 1%
> /mnt/hasanfs[OST:3]
>
> hasanfs-OST0004_UUID 197.3G 1.2G 185.3G 1%
> /mnt/hasanfs[OST:4]
>
> hasanfs-OST0005_UUID 197.3G 1.2G 185.4G 1%
> /mnt/hasanfs[OST:5]
>
> hasanfs-OST0006_UUID 197.3G 1.2G 185.3G 1%
> /mnt/hasanfs[OST:6]
>
> hasanfs-OST0007_UUID 197.3G 1.2G 185.4G 1%
> /mnt/hasanfs[OST:7]
>
>
> filesystem summary: 1.5T 9.6G 1.4T 1% /mnt/hasanfs
>
> It would be really helpful for me and highly appreciated if you can share
> your intuitions or reasons regarding why such improvement may have been
> observed.
>
> Thanks,
> Md. Hasanur Rashid
>
> On Thu, Oct 7, 2021 at 5:45 PM Andreas Dilger <adilger at whamcloud.com>
> wrote:
>
>> [*Caution*: Email from External Sender. Do not click or open links or
>> attachments unless you know this sender.]
>> On Oct 7, 2021, at 13:19, Md Hasanur Rashid via lustre-discuss <
>> lustre-discuss at lists.lustre.org> wrote:
>>
>>
>> Hello Everyone,
>>
>> I am running the Filebench benchmark in my Lustre cluster. I set the
>> max_rpcs_in_flight value to be 1. Before executing and after executing, I
>> verified that the value of max_rpcs_in_flight is indeed 1. However, when I
>> check the rpc_stats, the stat shows a much higher value for RPCs in flight.
>>
>>
>> If by "much higher than 1" you mean "2", then yes it appears there are
>> mostly (95%) 2 RPCs being processed concurrently on this OST.
>>
>> That might happen if you have 2 clients/mountpoints writing to the same
>> OST, it might be an off-by-one logic error allowing an extra RPC in flight,
>> it might be intentional for some reason (e.g. to avoid deadlock, memory
>> pressure, etc), or it might be accounting error in the statistics (e.g.
>> counting the next RPC to be sent before the first one is marked finished).
>>
>> Following is the value shown for one OSC just for reference:
>>
>> osc.hasanfs-OST0000-osc-ffff882fcf777000.rpc_stats=
>> snapshot_time: 1632483604.967291 (secs.usecs)
>> read RPCs in flight: 0
>> write RPCs in flight: 0
>> pending write pages: 0
>> pending read pages: 0
>>
>> read write
>> pages per rpc rpcs % cum % | rpcs % cum %
>> 1: 1 100 100 | 0 0 0
>> 2: 0 0 100 | 0 0 0
>> 4: 0 0 100 | 0 0 0
>> 8: 0 0 100 | 0 0 0
>> 16: 0 0 100 | 0 0 0
>> 32: 0 0 100 | 0 0 0
>> 64: 0 0 100 | 0 0 0
>> 128: 0 0 100 | 0 0 0
>> 256: 0 0 100 | 9508 100 100
>>
>> read write
>> rpcs in flight rpcs % cum % | rpcs % cum %
>> 0: 0 0 0 | 0 0 0
>> 1: 1 100 100 | 10 0 0
>> 2: 0 0 100 | 9033 95 95
>> 3: 0 0 100 | 465 4 100
>>
>> read write
>> offset rpcs % cum % | rpcs % cum %
>> 0: 1 100 100 | 725 7 7
>> 1: 0 0 100 | 0 0 7
>> 2: 0 0 100 | 0 0 7
>> 4: 0 0 100 | 0 0 7
>> 8: 0 0 100 | 0 0 7
>> 16: 0 0 100 | 0 0 7
>> 32: 0 0 100 | 0 0 7
>> 64: 0 0 100 | 0 0 7
>> 128: 0 0 100 | 0 0 7
>> 256: 0 0 100 | 718 7 15
>> 512: 0 0 100 | 1386 14 29
>> 1024: 0 0 100 | 2205 23 52
>> 2048: 0 0 100 | 1429 15 67
>> 4096: 0 0 100 | 1103 11 79
>> 8192: 0 0 100 | 1942 20 100
>>
>> Can anyone please explain to me why the RPCs in flight shown in the
>> rpc_stats could be higher than the max_rpcs_in_flight?
>>
>>
>> I do see a similar behavior with the statistics of my home system, which
>> has the default osc.*.max_rpcs_in_flight=8, but shows many cases of 9 RPCs
>> in flight in the statistics for both read and write, and in a few cases 10
>> or 11:
>>
>> read write
>> rpcs in flight rpcs % cum % | rpcs % cum %
>> 1: 121 2 2 | 27831 93 93
>> 2: 23 0 3 | 108 0 93
>> 3: 22 0 3 | 19 0 93
>> 4: 24 0 4 | 15 0 93
>> 5: 19 0 5 | 10 0 93
>> 6: 26 0 5 | 13 0 93
>> 7: 176 4 9 | 39 0 93
>> 8: 933 22 32 | 75 0 94
>> 9: 2802 67 99 | 1207 4 98
>> 10: 10 0 100 | 543 1 99
>> 11: 0 0 100 | 1 0 100
>>
>>
>> The good news is that Lustre is open source, so you can look into the
>> lustre/osc code to see why this is happening. The limit is set by
>> cli->cl_max_rpcs_in_flight, and the stats are accounted by
>> cli->cl_write_rpc_hist.
>>
>> Out of curiosity, you don't say _why_ this off-by-one error is of
>> interest? Definitely it seems like a bug that could be fixed, but it
>> doesn't seem too critical to correct functionality.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211017/7de77630/attachment-0001.html>
More information about the lustre-discuss
mailing list