[lustre-discuss] lnet peer credits

Christopher J. Morrone morrone2 at llnl.gov
Mon Aug 1 11:16:06 PDT 2016


On 08/01/2016 06:33 AM, Thomas Roth wrote:
> Hi all,
> 
> is there a kind of a rule of thumb for the "min" number in
> /proc/sys/lnet/peers?

No, there is no rule of thumb.  It depends on too many factors in the
system.  In my experience, numbers like you are showing here are
completely normal.  The "min" field can be useful in context of problems
that are occurring, but even then you need to have some way to know when
the min occurred to corrolate it with whatever other issue is happening.
 There is work under way on master to allow zeroing out those fields for
just that purpose.  You are watching the min actively and see the
numbers suddenly spike and corrolate that with some higher level issue,
that can be useful.

If you are not seeing any issues in the system, there is no need to be
concerned about the numbers you posted.

> Our ko2iblnd peer_credits-Parameter is at the default value, obviously 8.
> 
> When I look up /proc/sys/lnet/peers (on an OSS), I typically get
> something like
> 
> 
> 10.20.1.76 at o2ib1            1    NA    -1     8     8     8     8   -18 0
> 10.20.0.188 at o2ib1           1    NA    -1     8     8     8     8   -29 0
> 10.20.0.44 at o2ib1            2    NA    -1     8     8     8     7   -15 72
> 10.20.1.165 at o2ib1           1    NA    -1     8     8     8     8   -18 0
> 10.20.1.21 at o2ib1            1    NA    -1     8     8     8     8 -2113 0
> 10.20.0.133 at o2ib1           1    NA    -1     8     8     8     8   -28 0
> 10.20.1.110 at o2ib1           1    NA    -1     8     8     8     8   -10 0
> 10.20.0.222 at o2ib1           1    NA    -1     8     8     8     8   -20 0
> 10.20.0.78 at o2ib1            1    NA    -1     8     8     8     8   -17 0
> 10.20.1.55 at o2ib1            1    NA    -1     8     8     8     8    -7 0
> 10.20.0.167 at o2ib1           1    NA    -1     8     8     8     8   -12 0
> 10.20.1.144 at o2ib1           1    NA    -1     8     8     8     8    -8 0
> 10.20.1.89 at o2ib1            1    NA    -1     8     8     8     8   -21 0
> 10.20.1.34 at o2ib1            1    NA    -1     8     8     8     8   -11 0
> 10.20.0.146 at o2ib1           1    NA    -1     8     8     8     8   -21 0
> 10.20.0.2 at o2ib1             1    NA    -1     8     8     8     8  -584 0
> 10.20.1.123 at o2ib1           1    NA    -1     8     8     8     8   -16 0
> 10.20.0.91 at o2ib1            1    NA    -1     8     8     8     8   -25 0
> 10.20.1.68 at o2ib1            1    NA    -1     8     8     8     8     1 0
> 10.20.0.180 at o2ib1           1    NA    -1     8     8     8     8   -22 0
> 10.20.0.185 at o2ib1           1    NA    -1     8     8     8     8   -17 0
> 10.20.0.41 at o2ib1            1    NA    -1     8     8     8     8   -25 0
> 10.20.1.162 at o2ib1           1    NA    -1     8     8     8     8   -14 0
> 10.20.1.18 at o2ib1            1    NA    -1     8     8     8     8  -919 0
> 10.20.0.130 at o2ib1           1    NA    -1     8     8     8     8   -13 0
> 10.20.1.107 at o2ib1           1    NA    -1     8     8     8     8    -7 0
> 10.20.0.219 at o2ib1           1    NA    -1     8     8     8     8   -12 0
> 10.20.0.75 at o2ib1            1    NA    -1     8     8     8     8   -21 0
> 10.20.1.196 at o2ib1           4    up    -1     8     8     8     8  -419 0
> 
> 
> (The last line, the only peer that is "up", is an LNET-router)
> 
> 
> Something to worry about?

That is normal.  up/down state information is only given for peers that
are routers.  "NA" means "Not Applicable".  That was an improvement over
the past when, if I remember correctly, all non-router peers were listed
as "down" regardless of their actual state.

Chris



More information about the lustre-discuss mailing list