[lustre-devel] Channel Bonding Debug Information

Olaf Weber olaf at sgi.com
Thu Oct 1 07:31:56 PDT 2015


On 28-09-15 21:30, Amir Shehata wrote:
> Hello,
>
> As a followup on the discussion in the LAD developer summit, regarding
> ensuring that there is enough debug information provided as part of the
> Channel Bonding solution, I'm sending this email to ask for ideas on what
> type of debug information you would like to see.
>
> thanks
> amir

Hi Amir,

My random and disorganized thoughts.

Significant state changes and anything unexpected should of course be logged.

In addition I'd like interfaces that allow me to efficiently get the 
status/stats of a specific network interface or a specific peer, as opposed 
to only being able to get the information for all interfaces or peers and 
then having to filter. That may imply an ioctl type interface instead of or 
in addition to debugfs or sysfs (or procfs).

For the local interfaces, stats include TX/RX counters, credits, interface 
state, and some measure of how busy the interface is. The latter can be 
derived by watching the TX/RX counters over time, but it would be nice to 
have it calculated. A variant on the "File Heat" idea presented at LAD might 
work for this. (Think decaying sum over recent activity.) When interfaces 
are associated with CPTs, the CPT number -- especially important if the 
kernel automatically associates an interface with a CPT.

For the peers, a way to obtain the list of peers, and then to obtain the 
interfaces for each peer. Stats per peer interface include TX/RX counters 
and credits, perceived health, and maybe "heat". For a peer itself possibly 
totals, and peer health as perceived by the current node.

A note on calculating heat: the full list of peer interfaces becomes large 
(on the servers of a large cluster) and you don't want to walk it without 
needing to. If you store a timestamp for the last use, then heat can be 
calculated when the TX/RX counters are updated or read, which is when the 
relevant datastructure is being accessed anyway.

For local interfaces the list is likely small enough that this kind of 
approach isn't worth it. Moreover the list of local interfaces might be 
regularly walked to check on health etc.

Olaf

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696796
                            Veldzigt 2b       Fax:    +31(0)30-6696799
Sr Software Engineer       3454 PW de Meern  Vnet:   955-6796
Storage Software           The Netherlands   Email:  olaf at sgi.com


More information about the lustre-devel mailing list