[Lustre-discuss] Client evictions and RMDA failures
Brian J. Murrell
Brian.Murrell at Sun.COM
Tue Mar 31 10:32:15 PDT 2009
On Tue, 2009-03-31 at 11:38 -0400, syed haider wrote:
> Hi Brian,
Hi.
> Thanks for the response. I've run a few ib tests and here is an
> interesting response on the port for a failed node:
>
> [root at tiger-node-0-1 ~]# ibqueryerrors.pl -c -a -r
> Suppressing: RcvSwRelayErrors
> Errors for 0x0008f104003f0e21 "ISR9288/ISR9096 Voltaire sLB-24"
> GUID 0x0008f104003f0e21 port 23: [XmtDiscards == 4]
> Actions:
> XmtDiscards: This is a symptom of congestion and may require
> tweaking either HOQ or switch lifetime values
>
> Link info: 5 23[20] ==( 4X 2.5 Gbps)==>
> 0x0008f10403970e20 1[ ] "tiger-node-0-11 HCA-1"
FWIW, I have absolutely no idea what any of this means.
> This is interesting because other sources state that my problem is
> possibly related to an over-subscribed network even though there is no
> traffic on the network when these nodes hang. Are you familar with
> what settings need to be tweaked on a voltaire ib switch (9550) to
> possibly resolve this problem?
Not at all. Probably there are other lists out there with specific I/B
experts to help. You might also try going right back to your I/B
vendor.
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090331/14d89e72/attachment.pgp>
More information about the lustre-discuss
mailing list