[Lustre-discuss] Client evictions and RMDA failures

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Mar 31 10:32:15 PDT 2009


On Tue, 2009-03-31 at 11:38 -0400, syed haider wrote:
> Hi Brian,

Hi.

> Thanks for the response. I've run a few ib tests and here is an
> interesting response on the port for a failed node:
> 
> [root at tiger-node-0-1 ~]# ibqueryerrors.pl -c -a -r
> Suppressing: RcvSwRelayErrors
> Errors for 0x0008f104003f0e21 "ISR9288/ISR9096 Voltaire sLB-24"
>    GUID 0x0008f104003f0e21 port 23: [XmtDiscards == 4]
>          Actions:
>           XmtDiscards: This is a symptom of congestion and may require
> tweaking either HOQ or switch lifetime values
> 
>          Link info:      5   23[20]  ==( 4X 2.5 Gbps)==>
> 0x0008f10403970e20    1[  ] "tiger-node-0-11 HCA-1"

FWIW, I have absolutely no idea what any of this means.

> This is interesting because other sources state that my problem is
> possibly related to an over-subscribed network even though there is no
> traffic on the network when these nodes hang. Are you familar with
> what settings need to be tweaked on a voltaire ib switch (9550) to
> possibly resolve this problem?

Not at all.  Probably there are other lists out there with specific I/B
experts to help.  You might also try going right back to your I/B
vendor.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090331/14d89e72/attachment.pgp>


More information about the lustre-discuss mailing list