[lustre-discuss] [EXTERNAL] Re: Fwd: RPCs in Flight are more than the max_rpcs_in_flight value

Sun Oct 17 19:28:34 PDT 2021

All my OSS's are mounted on SSD. So, I think that wouldn't be the reason
behind such improvements.

Thanks,
Hasan

On Sun, Oct 17, 2021 at 3:54 AM Andreas Dilger <adilger at whamcloud.com>
wrote:

> It seems likely if you are using spinning disks (HDD)  that using
> max_rpcs_in_flight=1 will reduce the seeking on the OSTs, especially with
> large RPC sizes (16MB).
>
> Cheers, Andreas
>
> On Oct 16, 2021, at 16:11, Md Hasanur Rashid <mrashid2 at uncc.edu> wrote:
>
> 
> Hi Everyone,
>
> Thanks for getting back to me, Andreas and Patrick. The reason for me to
> experiment with max_rpcs_in_flight being 1 is that I have observed
> significant improvement over the default value when the workload is the
> following one:
>
> define fileset name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1"
>
> define process name="readerP",instances=2 {
>   thread name="readerT",instances=4 {
>   flowop openfile name="openOP",filesetname="testF"
>   flowop writewholefile name="writeOP",iters=4,filesetname="testF"
>   flowop readwholefile name="readOP",iters=1,filesetname="testF"
>   flowop closefile name="closeOP"
>   }
> }
>
> create files
> system "sync"
> system "echo 3 > /proc/sys/vm/drop_caches"
>
> run 60
>
> The performance when max_rpcs_in_flight has default value looks like the
> following:
>
> *max_rpcs_in_flight=8, max_pages_per_rpc=256, max_dirty_mb=2000*
>
>
> 64.164: Per-Operation Breakdown
>
> closeOP              5612ops       94ops/s   0.0mb/s    0.581ms/op
> [0.278ms - 1.966ms]
>
> readOP               5612ops       94ops/s 5985.6mb/s   19.335ms/op
> [14.957ms - 83.853ms]
>
> writeOP              22462ops      374ops/s 5989.3mb/s   16.248ms/op
> [5.683ms - 626.451ms]
>
> openOP               5620ops       94ops/s   0.0mb/s    0.539ms/op
> [0.313ms - 27.266ms]
>
> 64.164: IO Summary: 39306 ops 655.039 ops/s 94/374 rd/wr 11974.9mb/s
> 12.206ms/op
>
> If we change the max_rpcs_in_flight value to 1, the performance is
> following:
>
> *max_rpcs_in_flight=1, max_pages_per_rpc=256, max_dirty_mb=2000*
>
> 64.048: Per-Operation Breakdown
>
> closeOP              8631ops      144ops/s   0.0mb/s    0.593ms/op
> [0.273ms - 4.810ms]
>
> readOP               8631ops      144ops/s 9205.6mb/s   20.187ms/op
> [16.292ms - 37.513ms]
>
> writeOP              34545ops      576ops/s 9211.2mb/s    8.561ms/op
> [5.860ms - 55.334ms]
>
> openOP               8639ops      144ops/s   0.0mb/s    0.552ms/op
> [0.293ms - 10.132ms]
>
> 64.048: IO Summary: 60446 ops 1007.344 ops/s 144/576 rd/wr 18416.8mb/s
> 7.939ms/op
>
> As you can see from above the performance got improved when the value got
> changed to 1. Unfortunately, I am still not sure why such improvement is
> being observed. I ran the workload from a single client. The cluster setup
> looks like the following (5 server machines, 1 for MGS+MDS, and the other 4
> for OSSs where each one has 2 OSTs):
>
> UUID                       bytes        Used   Available Use% Mounted on
>
> hasanfs-MDT0000_UUID      144.8G       78.1M      134.8G   0%
> /mnt/hasanfs[MDT:0]
>
> hasanfs-OST0000_UUID      197.3G        1.1G      185.4G   1%
> /mnt/hasanfs[OST:0]
>
> hasanfs-OST0001_UUID      197.3G        1.2G      185.3G   1%
> /mnt/hasanfs[OST:1]
>
> hasanfs-OST0002_UUID      197.3G        1.2G      185.3G   1%
> /mnt/hasanfs[OST:2]
>
> hasanfs-OST0003_UUID      197.3G        1.2G      185.2G   1%
> /mnt/hasanfs[OST:3]
>
> hasanfs-OST0004_UUID      197.3G        1.2G      185.3G   1%
> /mnt/hasanfs[OST:4]
>
> hasanfs-OST0005_UUID      197.3G        1.2G      185.4G   1%
> /mnt/hasanfs[OST:5]
>
> hasanfs-OST0006_UUID      197.3G        1.2G      185.3G   1%
> /mnt/hasanfs[OST:6]
>
> hasanfs-OST0007_UUID      197.3G        1.2G      185.4G   1%
> /mnt/hasanfs[OST:7]
>
>
> filesystem summary:         1.5T        9.6G        1.4T   1% /mnt/hasanfs
>
> It would be really helpful for me and highly appreciated if you can share
> your intuitions or reasons regarding why such improvement may have been
> observed.
>
> Thanks,
> Md. Hasanur Rashid
>
> On Thu, Oct 7, 2021 at 5:45 PM Andreas Dilger <adilger at whamcloud.com>
> wrote:
>
>> [*Caution*: Email from External Sender. Do not click or open links or
>> attachments unless you know this sender.]
>>   On Oct 7, 2021, at 13:19, Md Hasanur Rashid via lustre-discuss <
>> lustre-discuss at lists.lustre.org> wrote:
>>
>>
>> Hello Everyone,
>>
>> I am running the Filebench benchmark in my Lustre cluster. I set the
>> max_rpcs_in_flight value to be 1. Before executing and after executing, I
>> verified that the value of max_rpcs_in_flight is indeed 1. However, when I
>> check the rpc_stats, the stat shows a much higher value for RPCs in flight.
>>
>>
>> If by "much higher than 1" you mean "2", then yes it appears there are
>> mostly (95%) 2 RPCs being processed concurrently on this OST.
>>
>> That might happen if you have 2 clients/mountpoints writing to the same
>> OST, it might be an off-by-one logic error allowing an extra RPC in flight,
>> it might be intentional for some reason (e.g. to avoid deadlock, memory
>> pressure, etc), or it might be accounting error in the statistics (e.g.
>> counting the next RPC to be sent before the first one is marked finished).
>>
>> Following is the value shown for one OSC just for reference:
>>
>> osc.hasanfs-OST0000-osc-ffff882fcf777000.rpc_stats=
>> snapshot_time:         1632483604.967291 (secs.usecs)
>> read RPCs in flight:  0
>> write RPCs in flight: 0
>> pending write pages:  0
>> pending read pages:   0
>>
>> read write
>> pages per rpc         rpcs   % cum % |       rpcs   % cum %
>> 1:         1 100 100   |          0   0   0
>> 2:         0   0 100   |          0   0   0
>> 4:         0   0 100   |          0   0   0
>> 8:         0   0 100   |          0   0   0
>> 16:         0   0 100   |          0   0   0
>> 32:         0   0 100   |          0   0   0
>> 64:         0   0 100   |          0   0   0
>> 128:         0   0 100   |          0   0   0
>> 256:         0   0 100   |       9508 100 100
>>
>> read write
>> rpcs in flight        rpcs   % cum % |       rpcs   % cum %
>> 0:         0   0   0   |          0   0   0
>> 1:         1 100 100   |         10   0   0
>> 2:         0   0 100   |       9033  95  95
>> 3:         0   0 100   |        465   4 100
>>
>> read write
>> offset                rpcs   % cum % |       rpcs   % cum %
>> 0:         1 100 100   |        725   7   7
>> 1:         0   0 100   |          0   0   7
>> 2:         0   0 100   |          0   0   7
>> 4:         0   0 100   |          0   0   7
>> 8:         0   0 100   |          0   0   7
>> 16:         0   0 100   |          0   0   7
>> 32:         0   0 100   |          0   0   7
>> 64:         0   0 100   |          0   0   7
>> 128:         0   0 100   |          0   0   7
>> 256:         0   0 100   |        718   7  15
>> 512:         0   0 100   |       1386  14  29
>> 1024:         0   0 100   |       2205  23  52
>> 2048:         0   0 100   |       1429  15  67
>> 4096:         0   0 100   |       1103  11  79
>> 8192:         0   0 100   |       1942  20 100
>>
>> Can anyone please explain to me why the RPCs in flight shown in the
>> rpc_stats could be higher than the max_rpcs_in_flight?
>>
>>
>> I do see a similar behavior with the statistics of my home system, which
>> has the default osc.*.max_rpcs_in_flight=8, but shows many cases of 9 RPCs
>> in flight in the statistics for both read and write, and in a few cases 10
>> or 11:
>>
>> read write
>> rpcs in flight        rpcs   % cum % |       rpcs   % cum %
>> 1:        121   2   2   |      27831  93  93
>> 2:         23   0   3   |        108   0  93
>> 3:         22   0   3   |         19   0  93
>> 4:         24   0   4   |         15   0  93
>> 5:         19   0   5   |         10   0  93
>> 6:         26   0   5   |         13   0  93
>> 7:        176   4   9   |         39   0  93
>> 8:        933  22  32   |         75   0  94
>> 9:       2802  67  99   |       1207   4  98
>> 10:         10   0 100   |        543   1  99
>> 11:          0   0 100   |          1   0 100
>>
>>
>> The good news is that Lustre is open source, so you can look into the
>> lustre/osc code to see why this is happening.  The limit is set by
>> cli->cl_max_rpcs_in_flight, and the stats are accounted by
>> cli->cl_write_rpc_hist.
>>
>> Out of curiosity, you don't say _why_ this off-by-one error is of
>> interest?  Definitely it seems like a bug that could be fixed, but it
>> doesn't seem too critical to correct functionality.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211017/7de77630/attachment-0001.html>