[lustre-discuss] [EXTERNAL] good ways to identify clients causing problems?

Bill Anderson andersnb at ucar.edu
Sat May 29 10:50:24 PDT 2021


    Thanks!

    Bill


On Sat, May 29, 2021 at 11:44 AM Raj <rajgautam at gmail.com> wrote:

> One other way is to install xltop(https://github.com/jhammond/xltop)
> and use xltop client (ncurses based linux top like tool) to watch for
> top client with more requests per sec (xltop -k q h).
> You can also use it to track jobs but you might have to write your own
> nodes to job mapping script (xltop-clusterd).
>
> On Fri, May 28, 2021 at 4:21 PM Mohr, Rick via lustre-discuss
> <lustre-discuss at lists.lustre.org> wrote:
> >
> > Bill,
> >
> > One option I have used in the past is to look at the rpc request
> history.  For example, on an oss server, you can run:
> >
> > lctl get_param ost.OSS.ost_io.req_history
> >
> > and then extract the client nid for each request.   Based on that, you
> can calculate the number of requests coming into the server and look for
> any clients that are significantly higher than the others.  Maybe something
> like:
> >
> > lctl get_param ost.OSS.ost_io.req_history | cut -d: -f3 | sort | uniq -c
> | sort -n
> >
> > I have used that approach in the past to identify misbehaving clients
> (the number of requests from such clients was usually one or two orders of
> magnitude higher than the others).  If multiple clients are unusually high,
> you may be able to correlate the nodes with currently running jobs to
> identify a particular job (assuming you don't already have lustre job stats
> enabled).
> >
> > -Rick
> >
> >
> > On 5/4/21, 2:41 PM, "lustre-discuss on behalf of Bill Anderson via
> lustre-discuss" <lustre-discuss-bounces at lists.lustre.org on behalf of
> lustre-discuss at lists.lustre.org> wrote:
> >
> >
> >        Hi All,
> >
> >        Can you recommend good ways to identify Lustre client hosts that
> might be causing stability or performance problems for the entire
> filesystem?
> >
> >        For example, if a user is inadvertently doing something that's
> creating an RPC storm, what are good ways to identify the client host that
> has triggered the storm?
> >
> >        Thank you!
> >
> >        Bill
> >
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210529/d751bfe9/attachment-0001.html>


More information about the lustre-discuss mailing list