[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Andreas Dilger andreas.dilger at oracle.com
Thu Jul 8 14:21:36 PDT 2010


On 2010-07-08, at 14:01, Guy Coates wrote:
> Try this script; (It is from Bernd Schubert). It will parse the
> per-client  proc stats on the mds/oss into something nice and
> humanly-readable. It is very useful.

I'm not sure I'd quite call it "human readable", but it does show that there is a need for something to print out stats for all of the clients.

===================== /proc/fs/lustre/obdfilter/myth-OST0004/exports ============================
0 at lo read_bytes 123343 samples [bytes] 1 1048576 64498717397 write_bytes 18457 samples [bytes] 1 1048576 3200834973 get_info 2 samples [reqs] set_info_async 1 samples [reqs] disconnect 3 samples [reqs] create 420 samples [reqs] destroy 883 samples [reqs] setattr 13276 samples [reqs] punch 15 samples [reqs] preprw 141800 samples [reqs] commitrw 141800 samples [reqs]
192.168.20.147 at tcp read_bytes 146 samples [bytes] 4096 1048576 114471161 write_bytes 7 samples [bytes] 163840 1048576 5244376 disconnect 6 samples [reqs] preprw 153 samples [reqs] commitrw 153 samples [reqs]
192.168.20.154 at tcp read_bytes 550 samples [bytes] 4096 1048576 270017490 write_bytes 1126 samples [bytes] 32 1048576 614266996 disconnect 2 samples [reqs] preprw 1676 samples [reqs] commitrw 1676 samples [reqs]
192.168.20.159 at tcp read_bytes 88745 samples [bytes] 0 1048576 61982699353 write_bytes 75428 samples [bytes] 16 1048576 27989934969 get_info 4 samples [reqs] disconnect 22 samples [reqs] destroy 113 samples [reqs] setattr 1 samples [reqs] punch 154 samples [reqs] sync 81914 samples [reqs] preprw 164173 samples [reqs] commitrw 164173 samples [reqs]
=============================================================================================

Probably an equivalent script that produces more readable output would be like:

egrep -v "snapshot|ping" /proc/fs/lustre/{mds,obdfilter}/*/exports/*/stats | cut -d/ -f 6,8,9

which will print something like:

myth-MDT0000/0 at lo/stats:open                      10 samples [reqs]
myth-MDT0000/0 at lo/stats:close                      2 samples [reqs]
myth-MDT0000/0 at lo/stats:getxattr                   1 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:open      3654 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:close     1827 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:unlink       1 samples [reqs]
myth-MDT0000/192.168.20.159 at tcp/stats:getxattr 15674 samples [reqs]
myth-OST0000/0 at lo/stats:read_bytes              2137 samples [bytes]
myth-OST0000/0 at lo/stats:preprw                  2137 samples [reqs]
:
:

I would also recommend the "llstat" tool that is part of Lustre for ages already, that will do mostly the same thing but can print it like "vmstat" output with the current operation rates.  The main difference is that the "lustre_client_stats.sh" script prints the output for all of the clients at once.

While we are on the topic, people may also be interested in "llobdstat", which prints an IO-oriented status for any "stats" file containing the read_bytes and write_bytes entries:

llobdstat myth-OST0000 2
/usr/bin/llobdstat on obdfilter/myth-OST0000
Processor counters run at 2800.419 MHz
Read: 4.08846e+11, Write: 9.0329e+10, create/destroy: 1133/1996, stat: 12128, punch: 241
[NOTE: cx: create, dx: destroy, st: statfs, pu: punch ]

Timestamp   Read-delta  ReadRate  Write-delta  WriteRate
--------------------------------------------------------
1278622955   21.00MB   10.48MB/s     0.00MB    0.00MB/s
1278622957   23.00MB   11.48MB/s     0.00MB    0.00MB/s
1278622959   22.33MB   11.14MB/s     0.00MB    0.00MB/s
1278622961   11.68MB    5.83MB/s     0.00MB    0.00MB/s
1278622963   18.45MB    9.20MB/s     0.00MB    0.00MB/s st:1
1278622965   20.72MB   10.34MB/s     0.00MB    0.00MB/s st:1

It can also be used on a client stats file, like
/proc/fs/lustre/osc/myth-OST0000-osc-ffff81001f5d54d0/stats

Bernd, would you (or anyone) be interested to enhance those tools to be able to show stats data from multiple files at once (each prefixed by the device name and/or client NID)?  I don't think it makes sense to create separate tools for this.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list