<div dir="ltr"><span id="gmail-docs-internal-guid-038f5ee2-7fff-e721-4d7b-ba3ff2cdfa4a"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Hi,</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">We seem to be hitting a performance issue with Lustre clients 2.12.2 and 2.12.3. Over time, the grant size of the OSC is shrinking and getting under 1MB and does not grow back. This lowers the performance of this client to a few MB/s, even in the kB/s for some OST. This does not seem to happen on 2.10.8 clients since they don’t have the “grant_shrink” flag. The servers are running 2.12.3 with ZFS 0.7.9.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Here is what we can see as performance per OST with a simple dd test, the worst OST is #5 with 222 kB/s. A client with 2.10 on the same OST is reaching > 800MB/s.</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">for i in {0..37}; do lfs setstripe --ost $i --stripe-count 1 ost$i ; done</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">for i in {0..37}; do dd if=/dev/zero of=ost$i bs=1M count=100; done</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 0.142473 s, 736 MB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 9.22021 s, 11.4 MB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 0.0905684 s, 1.2 GB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 6.36873 s, 16.5 MB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 0.0929602 s, 1.1 GB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 471.699 s, 222 kB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">104857600 bytes (105 MB) copied, 0.177067 s, 592 MB/s</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[...]</span></p><br><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">As an example, this slow client have a grant_size of 0.8MB after being up for a while:</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">lctl get_param osc.lustre04-OST0005*.cur_grant_bytes</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">osc.lustre04-OST0005-osc-ffff98128d818000.cur_grant_bytes=883028</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">In the debug logs, I can see a request sent as sync IO since the grant size is now too small to contain the 1.7MB request</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">00000008:00000020:10.0:1585145743.107840:0:116122:0:(osc_cache.c:1590:osc_enter_cache()) lustre04-OST0005-osc-ffff98128d818000: grant { dirty: 0/512000 dirty_pages: 448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 146368, left: 64, waiters: 0 }need:1703936</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">00000008:00000020:10.0:1585145743.107842:0:116122:0:(osc_cache.c:1539:osc_enter_cache_try()) lustre04-OST0005-osc-ffff98128d818000: grant { dirty: 0/512000 dirty_pages: 448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 146368, left: 64, waiters: 0 }need:1703936</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">00000008:00000020:10.0:1585145743.107843:0:116122:0:(osc_cache.c:1666:osc_enter_cache()) lustre04-OST0005-osc-ffff98128d818000: grant { dirty: 0/512000 dirty_pages: 448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 146368, left: 64, waiters: 0 }no grant space, fall back to sync i/o</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">There is currently 30GB granted on a OST with about 22TB free.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[root@lustre04-oss1 ~]# lctl get_param obdfilter/lustre04-OST0005/tot_granted</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">obdfilter.lustre04-OST0005.tot_granted=30257446912</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Somehow, the client does not receive a bigger grant, so it seems to stay forever under 1MB. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">00000008:00000020:4.0:1585145743.107950:0:22701:0:(osc_request.c:705:osc_announce_cached()) dirty: 0 undirty: 2080374783 dropped 0 grant: 883028</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:"Courier New";color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">00000008:00000020:14.0:1585145743.236923:0:22702:0:(osc_request.c:727:osc_update_grant()) got 0 extra grant</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Is this a known issue ? I could not find a similar ticket in JIRA, but I do see some references to disabling grant_shrink in LU-12651 and LU-12759. </span></p></span><br class="gmail-Apple-interchange-newline"></div>