[Lustre-discuss] Client evictions and RMDA failures

syed haider syed.haider at gmail.com
Tue Mar 31 13:02:34 PDT 2009


Thanks Brian. On one of the hung nodes I umounted lustre, rmmod lustre
and reloaded the module and I mounted lustre again. The mount hangs
again but I see 16 OSTs in "ST" state. These are also listed as in
"UP" state:

 0 UP mgc MGC192.255.255.254 at o2ib bf0dec15-659a-5817-6c78-0d43ca25e7c9 5
  1 UP lov lustre-clilov-ffff8100af56d000 bfab1519-8ef0-1b82-a0dd-ee82577481cc 4
  2 UP mdc lustre-MDT0000-mdc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  3 UP osc lustre-OST0000-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  4 UP osc lustre-OST0001-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  5 UP osc lustre-OST0002-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  6 UP osc lustre-OST0003-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  7 UP osc lustre-OST0004-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  8 UP osc lustre-OST0005-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
  9 UP osc lustre-OST0006-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 10 UP osc lustre-OST0007-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 11 UP osc lustre-OST0008-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 12 UP osc lustre-OST0009-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 13 UP osc lustre-OST000a-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 14 UP osc lustre-OST000b-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 15 UP osc lustre-OST000c-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 16 UP osc lustre-OST000d-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 17 UP osc lustre-OST000e-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 18 UP osc lustre-OST000f-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 19 ST osc lustre-OST0010-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 20 ST osc lustre-OST0011-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 21 ST osc lustre-OST0012-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 22 ST osc lustre-OST0013-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 23 ST osc lustre-OST0014-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 24 ST osc lustre-OST0015-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 25 ST osc lustre-OST0016-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 26 ST osc lustre-OST0017-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 27 ST osc lustre-OST0018-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 28 ST osc lustre-OST0019-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 29 ST osc lustre-OST001a-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 30 ST osc lustre-OST001b-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 31 ST osc lustre-OST001c-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 32 ST osc lustre-OST001d-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 33 ST osc lustre-OST001e-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 34 ST osc lustre-OST001f-osc-ffff8100b9519400
3d28a9b3-d7c7-2846-e634-43bdce74f96a 2
 35 UP osc lustre-OST0010-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 36 UP osc lustre-OST0011-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 37 UP osc lustre-OST0012-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 38 UP osc lustre-OST0013-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 39 UP osc lustre-OST0014-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 40 UP osc lustre-OST0015-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 41 UP osc lustre-OST0016-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 42 UP osc lustre-OST0017-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 43 UP osc lustre-OST0018-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 44 UP osc lustre-OST0019-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 45 UP osc lustre-OST001a-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 46 UP osc lustre-OST001b-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 47 UP osc lustre-OST001c-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 48 UP osc lustre-OST001d-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 49 UP osc lustre-OST001e-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5
 50 UP osc lustre-OST001f-osc-ffff8100af56d000
bfab1519-8ef0-1b82-a0dd-ee82577481cc 5

What would cause this? Could this be because of the fabric also?

On Tue, Mar 31, 2009 at 1:32 PM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:
> On Tue, 2009-03-31 at 11:38 -0400, syed haider wrote:
>> Hi Brian,
>
> Hi.
>
>> Thanks for the response. I've run a few ib tests and here is an
>> interesting response on the port for a failed node:
>>
>> [root at tiger-node-0-1 ~]# ibqueryerrors.pl -c -a -r
>> Suppressing: RcvSwRelayErrors
>> Errors for 0x0008f104003f0e21 "ISR9288/ISR9096 Voltaire sLB-24"
>>    GUID 0x0008f104003f0e21 port 23: [XmtDiscards == 4]
>>          Actions:
>>           XmtDiscards: This is a symptom of congestion and may require
>> tweaking either HOQ or switch lifetime values
>>
>>          Link info:      5   23[20]  ==( 4X 2.5 Gbps)==>
>> 0x0008f10403970e20    1[  ] "tiger-node-0-11 HCA-1"
>
> FWIW, I have absolutely no idea what any of this means.
>
>> This is interesting because other sources state that my problem is
>> possibly related to an over-subscribed network even though there is no
>> traffic on the network when these nodes hang. Are you familar with
>> what settings need to be tweaked on a voltaire ib switch (9550) to
>> possibly resolve this problem?
>
> Not at all.  Probably there are other lists out there with specific I/B
> experts to help.  You might also try going right back to your I/B
> vendor.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list