[Lustre-discuss] review: Lustre client procfs stats

Andreas Dilger adilger at whamcloud.com
Wed May 11 23:48:58 PDT 2011


On 2011-05-10, at 3:14 PM, Richard Henwood <rhenwood at whamcloud.com> wrote:
> John and I have written the following review, focused on the Lustre
> client procfs stats file. John's talk at LUG [1] provides some of the
> background to this work. We welcome your thoughts.
> 
> John Hammond (TACC)
> Richard Henwood (Whamcloud)
> 
> 
> Introduction
> ------------
> 
> The Lustre proc filesystem (procfs) is a convenient way to modify and
> review a Lustre filesystem. The Lustre procfs is a solid interface for
> tool writers to build upon. Tools that deliver precise and accurate
> metrics are valuable in trouble-shooting a production Lustre
> filesystem.
> 
> Documentation is available [2,3] for Lustre procfs but it is not
> complete. In particular, is does not describe the contents of the
> /lustre/llite/<mount_id>/stats file. This file is a natural place to
> store filesystem metrics for each Lustre filesystem the client has
> mounted.
> 
> This document is concerned with describing deficiencies and
> enhancements to Lustre procfs. The scope of this document is limited
> to the stats file in procfs of a mounted Lustre filesystem as seen by
> a client. Our intention is to air our suggestions, make modifications
> to our ideas based on feedback and create a patch for review.
> 
> Lustre client metrics
> ---------------------
> 
> One of the primary objectives for the support staff at TACC is to
> maintain computing capability. A simple measure of capability is that
> clients of the Lustre filesystem are consuming data. An accurate
> measure of the quantity of data consumed by a given client is useful
> in sharing resources and scheduling jobs. This measure (the number of
> bytes read) should be simple to collect.
> 
> Declarative stats file
> ----------------------
> 
> The existing proc filesystem on a client provides a 'stats' file. This
> is located: /proc/fs/lustre/llite/<mount_id>/stats. The contents of
> this file initially include:
> 
> snapshot_time             1304977515.150559 secs.usecs
> ioctl                     1 samples [regs]
> alloc_inode               1 samples [regs]
> inode_permission          1 samples [regs]
> 
> *ENHANCEMENT* stats should declare all metrics that are recorded, even
> if they are zero. Currently tool developers must maintain their own
> lookups of all possible values and test for their absence. Declaring
> all the metrics voids the need to consult source code to identify all
> possible metrics.

In fact, this is how the "stats" file used to operate, however to avoid printing a lot of stats counters that are always zero for a given device the kernel filters out any values that have never been hit in the code.

In order to keep parts of the stats setup generic between the different layers of Lustre, the stats structure may contain counters that are never used. 

> Read bytes
> ----------
> 
> A client of a Lustre filesystem will be interested in the total bytes
> transfered over the fabric. The stats file appears to provide a
> valuable snapshot of high-level data transfer metrics. However, after
> investigation the values recorded are of limited value. read_bytes
> returns the number of bytes that have been requested. This is not the
> same as the number of bytes that have been read. The example below
> illustrates this confusion:
> 
> [root at rhel6_21 ~]# echo "hello lustre" > /mnt/lustre/test.txt
> [root at rhel6_21 ~]# cat /mnt/lustre/test.txt > /dev/null
> [root at rhel6_21 ~]# cat /proc/fs/lustre/llite/lustre-ffff88001aa95c00/stats
> ...
> read_bytes                1 samples [bytes] 2097152 2097152 2097152
> write_bytes               1 samples [bytes] 13 13 13
> ...
> 
> In this example, the read on the file was performed by cat. This
> requests the number of bytes it needs to fill it's internal buffer. It
> continues to do this until the read returns zero. So, in our example,
> the internal buffer size of cat is 1KB and it performs two reads. As
> it stands, this metric may be misleading to the uninformed.
> 
> *ENHANCEMENT* read_bytes should return the number of bytes that have
> been read, consistent with the behavior of write_bytes. This will
> avoid confusion for users and give a more accurate measure of the
> traffic over the filesystem.

I agree this is misleading and could probably be fixed fairly easily. 

> Cache misses
> 
> The Lustre client has a cache. File reads may be serviced by this
> cache, or the may need to be completed by the backend filesystem (a
> cache miss). It is possible to discover if a cache miss has taken
> place on the client, but it is time consuming and subject to race
> conditions.
> 
> *ENHANCEMENT* Bytes send over the wire should be explicitly recorded
> in the stats file. This will enable a detailed view of the client and
> network interaction with the filesystem.


> Conclusions
> -----------
> This document outlines changes to the procfs client stats file based
> on a experience gained using Lustre in production at TACC. The authors
> welcome feedback on these changes.
> 
> 1. http://www.olcf.ornl.gov/wp-content/events/lug2011/4-13-2011/330-400_John_Hammond_hammond-lug.pdf
> 2. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html
> 3. http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1294840
> 
> -- 
> Richard.Henwood at whamcloud.com
> Whamcloud Inc.
> tel: +1 512 410 9612
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list