[lustre-discuss] Traffic compression?

E.S. Rosenberg esr+lustre at mail.hebrew.edu
Tue Feb 7 09:37:38 PST 2017


Hi Ben,

On Mon, Feb 6, 2017 at 10:51 PM, Ben Evans <bevans at cray.com> wrote:

> My initial question is what are you measuring and where are you measuring
> it?
>
The tool I'm using is collectl, it in turn is calling perfquery once a
minute and at the end reports back the difference between the previous and
current reading divided by 256*secondInterval to provide a number of kB/s.
(perfquery reports counters /4 legacy left over from 32b counter days)

The lustre stats seem to be gathered more or less the same way, the lustre
plugin does a delta of written/read bytes, divides by 1024 * secondInterval
to get kB/s.

>
> There are many different layers of caching happening, possibly all at the
> same time.  If you're benchmarking it's much better to figure out your max
> sustained read/write speeds than rely on peaks.
>
I'm not benchmarking, was mainly trying to understand how/why my Infiniband
graphs weren't showing at least the same amount of traffic as Lustre...

Most of the time though the graphs do more or less coincide so I guess
maybe there was either a measurement glitch or we do see some limited
effects of caching.

Thanks,
Eli

-Ben

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of
"E.S. Rosenberg" <esr+lustre at mail.hebrew.edu>
Date: Monday, February 6, 2017 at 3:25 PM
To: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Traffic compression?

We started closer monitoring of resources on our cluster and I noticed that
there is sometimes a big discrepancy between the read traffic reported by
Lustre and the incoming traffic reported by infiniband (which is the
interace carrying the Lustre traffic).

Currently I have a 4.4GB peak on Lustre while Infiniband at the same time
is showing just 1.4GB/s traffic (also there is a 2 minute difference
between the 2 peaks)
This is the summation of all the nodes (without the servers) in the cluster.
The stats are gathered using collectl at a 1 minute interval.

Thanks,
Eli

(There are also lots of stats that match 1:1 which makes me less sure what
to make of this)

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170207/480c8b18/attachment-0001.htm>


More information about the lustre-discuss mailing list