[lustre-discuss] [External] Re: obdfilter/mdt stats meaning ?

Degremont, Aurelien degremoa at amazon.com
Tue Jul 16 09:04:39 PDT 2019


>                           read      |     write
>disk I/Os in flight    ios   % cum % |  ios         % cum %
>1:               211177215  61  61   | 29305564  97  97
>2:                41332944  11  72   | 498260   1  99
>[..]
>Does these lines means :
>Since last snapshot there was 211177215x1 and read 41332944x2 I/O in flight ?

It means (since the last time the statistics were cleared)

  *   11% of the time, 2 READ I/O requests were "in-flights" to disk, meaning 2 I/O were sent to disks and not yet commit/acknowledged
  *   61 % of the time, only 1 READ I/O request.

Same principle for write.

What this means here is that your workload is not feeding the disks with lots of write (97% with 1 I/O in flight), but a bit more reads.
Disks and especially disk arrays are reordering I/O and distributing them across the various drives they are composed of to optimized bandwith. To really take benefits of all the possible bandwith/throughput your hardward can offer, you often need to be able to have lots of big I/O and possible multiple I/O in flights.
Few I/O in flight could means:

  *   your workload is not really big
  *   your hardward is fast compared to the throughput coming to this server (ratio disk BW vs network BW by example)
This could also help you identify bad performance numbers and find from where the bottleneck comes from.


De : lustre-discuss <lustre-discuss-bounces at lists.lustre.org> au nom de Louis Bailleul <Louis.Bailleul at pgs.com>
Date : mardi 16 juillet 2019 à 17:49
À : lustre-discuss <lustre-discuss at lists.lustre.org>
Objet : Re: [lustre-discuss] [External] Re: obdfilter/mdt stats meaning ?

Hi Aurélien,

Thanks for the prompt reply.
For the ost stats, any idea what the preprw and commitrw mean ?
And why there are two entries with different values for statfs ?

For brw_stats even with the doc I still struggle to read this.
For example how do you make sense of disk I/O in flight ?
                           read      |     write
disk I/Os in flight    ios   % cum % |  ios         % cum %
1:               211177215  61  61   | 29305564  97  97
2:                41332944  11  72   | 498260   1  99
[..]
Does these lines means :
Since last snapshot there was 211177215x1 and read 41332944x2 I/O in flight ?

Best regards,
Louis
On 16/07/2019 15:50, Degremont, Aurelien wrote:
Hi Louis,

About brw_stats, there are a bit of explanation in the Lustre Doc (not that detailed, but still)
http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.50438271_55057

> Last thing, is there any way to get the name of the filesystem an OST is part of by using lctl ?

I don't know what you want exactly, but the OST names are self explanatory, there always are like: fsname-OSTXXXX
Where fsname is the lustre filesystem they are part of.

For obdfilter stats, these are mostly action to OST objects or client connection management RPCs.

    setattr: changing an OST object attributes (owner, group, ...)
    punch: mostly used for truncate (theorically can do holes in files, like truncate with a start and length)
    sync: straighforward, sync OST to disk
    destroy: delete an OST object (mostly when a file is deleted)
    create: create an OST object
    statfs: like 'df' for this specific OST (used by 'lfs df' by example)
    (re)connect: when a client connect/reconnect to this OST
    ping: when a client ping this OST.


Aurélien

De : lustre-discuss <lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de Louis Bailleul <Louis.Bailleul at pgs.com><mailto:Louis.Bailleul at pgs.com>
Date : mardi 16 juillet 2019 à 16:38
À : lustre-discuss <lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
Objet : [lustre-discuss] obdfilter/mdt stats meaning ?

Hi all,

I am trying to make sense of some of the OST/MDT stats for 2.12.
Can anybody point me to the doc that explain what the metrics are ?
The wiki only mention read/write/get_info : http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide<https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_Lustre-5FMonitoring-5Fand-5FStatistics-5FGuide&d=DwMGaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=FTXmt89oLXmbXfP78w86-PxB1XdLYgxG8hEoAnZvCvs&m=UC1t7z9tgmxUE2FWaTFHFT_Y69z_VMH0dEYF1VXadX0&s=cdXTUStD_NPwj3GtNYBqJA2nkJ1Ec53F9aD5UxFo5tw&e=>
But the list I get is quite different :
    obdfilter.OST001.stats=
    snapshot_time             1563285450.647120173 secs.nsecs
    read_bytes                340177708 samples [bytes] 4096 4194304 396712660910080
    write_bytes               30008856 samples [bytes] 24 4194304 78618271501667
    setattr                   1755 samples [reqs]
    punch                     73463 samples [reqs]
    sync                      50606 samples [reqs]
    destroy                   31990 samples [reqs]
    create                    956 samples [reqs]
    statfs                    75378743 samples [reqs]
    connect                   5798 samples [reqs]
    reconnect                 3242 samples [reqs]
    disconnect                5820 samples [reqs]
    statfs                    3737980 samples [reqs]
    preprw                    370186566 samples [reqs]
    commitrw                  370186557 samples [reqs]
    ping                      882096292 samples [reqs]
For the MDT, most are pretty much self explanatory, but I'll still be happy to be pointed to some doc.
mdt.MDT0000.md_stats=
snapshot_time             1563287416.006001068 secs.nsecs
open                      3174644054 samples [reqs]
close                     3174494603 samples [reqs]
mknod                     107564 samples [reqs]
unlink                    99625 samples [reqs]
mkdir                     199643 samples [reqs]
rmdir                     45021 samples [reqs]
rename                    12728 samples [reqs]
getattr                   50227431 samples [reqs]
setattr                   103435 samples [reqs]
getxattr                  9051470 samples [reqs]
setxattr                  14 samples [reqs]
statfs                    7525513 samples [reqs]
sync                      20597 samples [reqs]
samedir_rename            207 samples [reqs]
crossdir_rename           12521 samples [reqs]
And anyone knows how to read the OST brw_stats ?
obdfilter.OST0014.brw_stats=
snapshot_time:         1563287631.511085465 (secs.nsecs)

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
1:               231699298  66  66   | 180944   0   0
2:                  855611   0  67   | 322359   1   1
4:                  541749   0  67   | 5539716  18  20
8:                 1281219   0  67   | 67837   0  20
16:                 637808   0  67   | 114546   0  20
32:                1342813   0  68   | 3099780  10  31
64:                1559834   0  68   | 173166   0  31
128:               1583127   0  69   | 211512   0  32
256:              10627583   3  72   | 499978   1  34
512:               3909601   1  73   | 1029686   3  37
1K:               92141161  26 100   | 18788597  62 100

                           read      |     write
discontiguous pages    rpcs  % cum % |  rpcs        % cum %
0:               346179839 100 100   | 180946   0   0
1:                       0   0 100   | 322363   1   1
2:                       0   0 100   | 5521062  18  20
3:                       0   0 100   | 18650   0  20
4:                       0   0 100   | 18159   0  20
5:                       0   0 100   | 26664   0  20
6:                       0   0 100   | 10830   0  20
7:                       0   0 100   | 12189   0  20
8:                       0   0 100   | 11365   0  20
9:                       0   0 100   | 10253   0  20
10:                      0   0 100   | 8810   0  20
11:                      0   0 100   | 9825   0  20
12:                      0   0 100   | 16740   0  20
13:                      0   0 100   | 14421   0  20
14:                      0   0 100   | 10513   0  20
15:                      0   0 100   | 32655   0  20
16:                      0   0 100   | 1418677   4  25
17:                      0   0 100   | 1477077   4  30
18:                      0   0 100   | 6227   0  30
19:                      0   0 100   | 7071   0  30
20:                      0   0 100   | 7297   0  30
21:                      0   0 100   | 8478   0  30
22:                      0   0 100   | 34591   0  30
23:                      0   0 100   | 35591   0  30
24:                      0   0 100   | 8378   0  30
25:                      0   0 100   | 8724   0  30
26:                      0   0 100   | 52300   0  30
27:                      0   0 100   | 14038   0  30
28:                      0   0 100   | 4734   0  30
29:                      0   0 100   | 4878   0  31
30:                      0   0 100   | 6232   0  31
31:                      0   0 100   | 20708383  68 100
                           read      |     write
disk I/Os in flight    ios   % cum % |  ios         % cum %
1:               211177215  61  61   | 29305564  97  97
2:                41332944  11  72   | 498260   1  99
3:                22250410   6  79   | 86831   0  99
4:                15524737   4  83   | 34513   0  99
5:                12049717   3  87   | 19442   0  99
6:                 8904108   2  89   | 13107   0  99
7:                 5955503   1  91   | 8748   0  99
8:                 3943444   1  92   | 6869   0  99
9:                 3115034   0  93   | 5447   0  99
10:                2553941   0  94   | 4593   0  99
11:                2121217   0  95   | 3828   0  99
12:                1709040   0  95   | 3264   0  99
13:                1418541   0  95   | 2800   0  99
14:                1184247   0  96   | 2454   0  99
15:                1047397   0  96   | 2153   0  99
16:                 875229   0  96   | 1871   0  99
17:                 752555   0  97   | 1643   0  99
18:                 656424   0  97   | 1531   0  99
19:                 584066   0  97   | 1375   0  99
20:                 529630   0  97   | 1267   0  99
21:                 477143   0  97   | 1144   0  99
22:                 426303   0  97   | 1067   0  99
23:                 385707   0  97   |  984   0  99
24:                 354584   0  98   |  959   0  99
25:                 328332   0  98   |  899   0  99
26:                 305886   0  98   |  828   0  99
27:                 281444   0  98   |  786   0  99
28:                 261958   0  98   |  734   0  99
29:                 242335   0  98   |  711   0  99
30:                 227010   0  98   |  692   0  99
31:                5203738   1 100   | 13757   0 100

                           read      |     write
I/O time (1/1000s)     ios   % cum % |  ios         % cum %
1:                34363647  26  26   |    0   0   0
2:                 9013233   7  33   |    0   0   0
4:                 3381561   2  36   |    0   0   0
8:                 2194196   1  38   |    0   0   0
16:                8767687   6  45   |    0   0   0
32:               25062401  19  64   |    0   0   0
64:               27196704  21  85   |    0   0   0
128:              10760610   8  94   |    0   0   0
256:               4203334   3  97   |    0   0   0
512:               2002196   1  99   |    0   0   0
1K:                 785539   0  99   |    0   0   0
2K:                 340525   0  99   |    0   0   0
4K:                 140336   0  99   |    0   0   0
8K:                   6875   0  99   |    0   0   0
16K:                   161   0 100   |    0   0   0

                           read      |     write
disk I/O size          ios   % cum % |  ios         % cum %
8:                       4   0   0   |    0   0   0
16:                      0   0   0   |    0   0   0
32:                      1   0   0   |    4   0   0
64:                      1   0   0   | 5703   0   0
128:                  3061   0   0   | 2853   0   0
256:                     1   0   0   | 3340   0   0
512:                     1   0   0   |  309   0   0
1K:                      0   0   0   | 3697   0   0
2K:                      2   0   0   | 38311   0   0
4K:              231696225  66  66   | 126727   0   0
8K:                 855613   0  67   | 322359   1   1
16K:                541749   0  67   | 5539716  18  20
32K:               1281219   0  67   | 67837   0  20
64K:                637808   0  67   | 114546   0  20
128K:              1342813   0  68   | 3099780  10  31
256K:              1559834   0  68   | 173166   0  31
512K:              1583127   0  69   | 211512   0  32
1M:               10627583   3  72   | 499978   1  34
2M:                3909601   1  73   | 1029686   3  37
4M:               92141161  26 100   | 18788597  62 100
Last thing, is there any way to get the name of the filesystem an OST is part of by using lctl ?

Best regards,
Louis




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190716/354e6b23/attachment-0001.html>


More information about the lustre-discuss mailing list