[Lustre-discuss] Client evictions and RMDA failures
syed haider
syed.haider at gmail.com
Tue Mar 31 13:02:34 PDT 2009
Thanks Brian. On one of the hung nodes I umounted lustre, rmmod lustre
and reloaded the module and I mounted lustre again. The mount hangs
again but I see 16 OSTs in "ST" state. These are also listed as in
"UP" state:
0 UP mgc MGC192.255.255.254 at o2ib bf0dec15-659a-5817-6c78-0d43ca25e7c9 5
1 UP lov lustre-clilov-ffff8100af56d000 bfab1519-8ef0-1b82-a0dd-ee82577481cc 4
2 UP mdc lustre-MDT0000-mdc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
3 UP osc lustre-OST0000-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
4 UP osc lustre-OST0001-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
5 UP osc lustre-OST0002-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
6 UP osc lustre-OST0003-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
7 UP osc lustre-OST0004-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
8 UP osc lustre-OST0005-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
9 UP osc lustre-OST0006-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
10 UP osc lustre-OST0007-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
11 UP osc lustre-OST0008-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
12 UP osc lustre-OST0009-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
13 UP osc lustre-OST000a-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
14 UP osc lustre-OST000b-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
15 UP osc lustre-OST000c-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
16 UP osc lustre-OST000d-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
17 UP osc lustre-OST000e-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
18 UP osc lustre-OST000f-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
19 ST osc lustre-OST0010-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
20 ST osc lustre-OST0011-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
21 ST osc lustre-OST0012-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
22 ST osc lustre-OST0013-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
23 ST osc lustre-OST0014-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
24 ST osc lustre-OST0015-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
25 ST osc lustre-OST0016-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
26 ST osc lustre-OST0017-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
27 ST osc lustre-OST0018-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
28 ST osc lustre-OST0019-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
29 ST osc lustre-OST001a-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
30 ST osc lustre-OST001b-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
31 ST osc lustre-OST001c-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
32 ST osc lustre-OST001d-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
33 ST osc lustre-OST001e-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
34 ST osc lustre-OST001f-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
35 UP osc lustre-OST0010-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
36 UP osc lustre-OST0011-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
37 UP osc lustre-OST0012-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
38 UP osc lustre-OST0013-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
39 UP osc lustre-OST0014-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
40 UP osc lustre-OST0015-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
41 UP osc lustre-OST0016-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
42 UP osc lustre-OST0017-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
43 UP osc lustre-OST0018-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
44 UP osc lustre-OST0019-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
45 UP osc lustre-OST001a-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
46 UP osc lustre-OST001b-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
47 UP osc lustre-OST001c-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
48 UP osc lustre-OST001d-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
49 UP osc lustre-OST001e-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
50 UP osc lustre-OST001f-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
What would cause this? Could this be because of the fabric also?
On Tue, Mar 31, 2009 at 1:32 PM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:
> On Tue, 2009-03-31 at 11:38 -0400, syed haider wrote:
>> Hi Brian,
>
> Hi.
>
>> Thanks for the response. I've run a few ib tests and here is an
>> interesting response on the port for a failed node:
>>
>> [root at tiger-node-0-1 ~]# ibqueryerrors.pl -c -a -r
>> Suppressing: RcvSwRelayErrors
>> Errors for 0x0008f104003f0e21 "ISR9288/ISR9096 Voltaire sLB-24"
>> GUID 0x0008f104003f0e21 port 23: [XmtDiscards == 4]
>> Actions:
>> XmtDiscards: This is a symptom of congestion and may require
>> tweaking either HOQ or switch lifetime values
>>
>> Link info: 5 23[20] ==( 4X 2.5 Gbps)==>
>> 0x0008f10403970e20 1[ ] "tiger-node-0-11 HCA-1"
>
> FWIW, I have absolutely no idea what any of this means.
>
>> This is interesting because other sources state that my problem is
>> possibly related to an over-subscribed network even though there is no
>> traffic on the network when these nodes hang. Are you familar with
>> what settings need to be tweaked on a voltaire ib switch (9550) to
>> possibly resolve this problem?
>
> Not at all. Probably there are other lists out there with specific I/B
> experts to help. You might also try going right back to your I/B
> vendor.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
More information about the lustre-discuss
mailing list