[lustre-discuss] [EXTERNAL] good ways to identify clients causing problems?
Bill Anderson
andersnb at ucar.edu
Sat May 29 10:49:45 PDT 2021
Thank you!
Bill
On Fri, May 28, 2021 at 3:20 PM Mohr, Rick <mohrrf at ornl.gov> wrote:
> Bill,
>
> One option I have used in the past is to look at the rpc request history.
> For example, on an oss server, you can run:
>
> lctl get_param ost.OSS.ost_io.req_history
>
> and then extract the client nid for each request. Based on that, you can
> calculate the number of requests coming into the server and look for any
> clients that are significantly higher than the others. Maybe something
> like:
>
> lctl get_param ost.OSS.ost_io.req_history | cut -d: -f3 | sort | uniq -c |
> sort -n
>
> I have used that approach in the past to identify misbehaving clients (the
> number of requests from such clients was usually one or two orders of
> magnitude higher than the others). If multiple clients are unusually high,
> you may be able to correlate the nodes with currently running jobs to
> identify a particular job (assuming you don't already have lustre job stats
> enabled).
>
> -Rick
>
>
> On 5/4/21, 2:41 PM, "lustre-discuss on behalf of Bill Anderson via
> lustre-discuss" <lustre-discuss-bounces at lists.lustre.org on behalf of
> lustre-discuss at lists.lustre.org> wrote:
>
>
> Hi All,
>
> Can you recommend good ways to identify Lustre client hosts that
> might be causing stability or performance problems for the entire
> filesystem?
>
> For example, if a user is inadvertently doing something that's
> creating an RPC storm, what are good ways to identify the client host that
> has triggered the storm?
>
> Thank you!
>
> Bill
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210529/b97b148c/attachment.html>
More information about the lustre-discuss
mailing list