[Lustre-discuss] help

Fri Sep 30 01:47:19 PDT 2011

Hi Ashok

If you have a valid support contract log a call with you local SGI 
office, you have a couple of bad IB ports, maybe a cable or other such 
thing. Include the information you provided below
and ask them help out.

On 30-September-2011 6:37 PM, Ashok nulguda wrote:
> Dear Sir,
>
>
> Thanks for your help.
>
> My system is ICE 8400 cluster with 30 TB of lustre of 64 node.
> oss1:~ # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             100G  5.8G   95G   6% /
> tmpfs                  12G  1.1M   12G   1% /dev
> tmpfs                  12G   88K   12G   1% /dev/shm
> /dev/sda1            1020M  181M  840M  18% /boot
> /dev/sda4             170G  6.6M  170G   1% /data1
> /dev/mapper/3600a0b8000755ee0000010964dc231bc_part1
>                       2.1T   74G  1.9T   4% /OST1
> /dev/mapper/3600a0b8000755ed1000010614dc23425_part1
>                       1.7T   67G  1.5T   5% /OST4
> /dev/mapper/3600a0b8000755ee0000010a04dc23323_part1
>                       2.1T   67G  1.9T   4% /OST5
> /dev/mapper/3600a0b8000755f1f000011224dc239d7_part1
>                       1.7T   67G  1.5T   5% /OST8
> /dev/mapper/3600a0b8000755dbe000010de4dc23997_part1
>                       2.1T   66G  1.9T   4% /OST9
> /dev/mapper/3600a0b8000755f1f000011284dc23b5a_part1
>                       1.7T   66G  1.5T   5% /OST12
> /dev/mapper/3600a0b8000755eb3000011304dc23db1_part1
>                       2.1T   66G  1.9T   4% /OST13
> /dev/mapper/3600a0b8000755f22000011104dc23ec7_part1
>                       1.7T   66G  1.5T   5% /OST16
>
>
> oss1:~ # rpm -qa | grep -i lustre
> kernel-default-2.6.27.39-0.3_lustre.1.8.4
> kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default
> lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-default-base-2.6.27.39-0.3_lustre.1.8.4
> lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default
>
>
> oss2:~ # Filesystem            Size  Used Avail Use% Mounted on
> /dev/sdcw3            100G  8.3G   92G   9% /
> tmpfs                  12G  1.1M   12G   1% /dev
> tmpfs                  12G   88K   12G   1% /dev/shm
> /dev/sdcw1           1020M  144M  876M  15% /boot
> /dev/sdcw4            170G   13M  170G   1% /data1
> /dev/mapper/3600a0b8000755ed10000105e4dc23397_part1
>                       1.7T   69G  1.5T   5% /OST2
> /dev/mapper/3600a0b8000755ee00000109b4dc232a0_part1
>                       2.1T   68G  1.9T   4% /OST3
> /dev/mapper/3600a0b8000755ed1000010644dc2349f_part1
>                       1.7T   67G  1.5T   5% /OST6
> /dev/mapper/3600a0b8000755dbe000010d94dc23873_part1
>                       2.1T   67G  1.9T   4% /OST7
> /dev/mapper/3600a0b8000755f1f000011254dc23add_part1
>                       1.7T   66G  1.5T   5% /OST10
> /dev/mapper/3600a0b8000755dbe000010e34dc23a09_part1
>                       2.1T   66G  1.9T   4% /OST11
> /dev/mapper/3600a0b8000755f220000110d4dc23e36_part1
>                       1.7T   66G  1.5T   5% /OST14
> /dev/mapper/3600a0b8000755eb3000011354dc23e39_part1
>                       2.1T   66G  1.9T   4% /OST15
> /dev/mapper/3600a0b8000755eb30000113a4dc23ec4_part1
>                       1.4T   66G  1.3T   6% /OST17
>
> [1]+  Done                    df -h
>
> oss2:~ # rpm -qa | grep -i lustre
> lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-default-base-2.6.27.39-0.3_lustre.1.8.4
> kernel-default-2.6.27.39-0.3_lustre.1.8.4
> kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default
> lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default
> lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
>
> mdc1:~ # Filesystem            Size  Used Avail Use% Mounted on
> /dev/sde2             100G  5.2G   95G   6% /
> tmpfs                  12G  184K   12G   1% /dev
> tmpfs                  12G   88K   12G   1% /dev/shm
> /dev/sde1            1020M  181M  840M  18% /boot
> /dev/sde4             167G  196M  159G   1% /data1
> /dev/mapper/3600a0b8000755f22000011134dc23f7e_part1
>                       489G  2.3G  458G   1% /MDC
>
> [1]+  Done                    df -h
> mdc1:~ #
>
>
> mdc1:~ # rpm -qa | grep -i lustre
> lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-default-2.6.27.39-0.3_lustre.1.8.4
> lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default
> lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-default-base-2.6.27.39-0.3_lustre.1.8.4
> mdc1:~ #
>
> mdc2:~ # Filesystem            Size  Used Avail Use% Mounted on
> /dev/sde3             100G  5.0G   95G   5% /
> tmpfs                  18G  184K   18G   1% /dev
> tmpfs                 7.8G   88K  7.8G   1% /dev/shm
> /dev/sde1            1020M  144M  876M  15% /boot
> /dev/sde4             170G  6.6M  170G   1% /data1
>
> [1]+  Done                    df -h
> mdc2:~ # rpm -qqa | grep -i lustre
> lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-default-base-2.6.27.39-0.3_lustre.1.8.4
> kernel-default-2.6.27.39-0.3_lustre.1.8.4
> lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default
> kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default
> lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default
> mdc2:~ #
>
>
> service0:~ # ibstat
> CA 'mlx4_0'
>     CA type: MT26428
>     Number of ports: 2
>     Firmware version: 2.7.0
>     Hardware version: a0
>     Node GUID: 0x0002c903000a6028
>     System image GUID: 0x0002c903000a602b
>     Port 1:
>         State: Active
>         Physical state: LinkUp
>         Rate: 40
>         Base lid: 9
>         LMC: 0
>         SM lid: 1
>         Capability mask: 0x02510868
>         Port GUID: 0x0002c903000a6029
>     Port 2:
>         State: Active
>         Physical state: LinkUp
>         Rate: 40
>         Base lid: 10
>         LMC: 0
>         SM lid: 1
>         Capability mask: 0x02510868
>         Port GUID: 0x0002c903000a602a
> service0:~ #
>
>
>
> service0:~ # ibstatus
> Infiniband device 'mlx4_0' port 1 status:
>     default gid:     fec0:0000:0000:0000:0002:c903:000a:6029
>     base lid:     0x9
>     sm lid:         0x1
>     state:         4: ACTIVE
>     phys state:     5: LinkUp
>     rate:         40 Gb/sec (4X QDR)
>
> Infiniband device 'mlx4_0' port 2 status:
>     default gid:     fec0:0000:0000:0000:0002:c903:000a:602a
>     base lid:     0xa
>     sm lid:         0x1
>     state:         4: ACTIVE
>     phys state:     5: LinkUp
>     rate:         40 Gb/sec (4X QDR)
>
> service0:~ #
>
>
>
> service0:~ # ibdiagnet
> Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
> -W- Topology file is not specified.
>     Reports regarding cluster links will use direct routes.
> Loading IBDM from: /usr/lib64/ibdm1.2
> -W- A few ports of local device are up.
>     Since port-num was not specified (-p option), port 1 of device 1 
> will be
>     used as the local port.
> -I- Discovering ... 88 nodes (9 Switches & 79 CA-s) discovered.
>
>
> -I---------------------------------------------------
> -I- Bad Guids/LIDs Info
> -I---------------------------------------------------
> -I- No bad Guids were found
>
> -I---------------------------------------------------
> -I- Links With Logical State = INIT
> -I---------------------------------------------------
> -I- No bad Links (with logical state = INIT) were found
>
> -I---------------------------------------------------
> -I- PM Counters Info
> -I---------------------------------------------------
> -I- No illegal PM counters values were found
>
> -I---------------------------------------------------
> -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
> -I---------------------------------------------------
> -I-    PKey:0x7fff Hosts:81 full:81 partial:0
>
> -I---------------------------------------------------
> -I- IPoIB Subnets Check
> -I---------------------------------------------------
> -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps 
> SL:0x00
> -W- Suboptimal rate for group. Lowest member rate:20Gbps > 
> group-rate:10Gbps
>
> -I---------------------------------------------------
> -I- Bad Links Info
> -I- No bad link were found
> -I---------------------------------------------------
> ----------------------------------------------------------------
> -I- Stages Status Report:
>     STAGE                                    Errors Warnings
>     Bad GUIDs/LIDs Check                     0      0
>     Link State Active Check                  0      0
>     Performance Counters Report              0      0
>     Partitions Check                         0      0
>     IPoIB Subnets Check                      0      1
>
> Please see /tmp/ibdiagnet.log for complete log
> ----------------------------------------------------------------
>
> -I- Done. Run time was 9 seconds.
> service0:~ #
>
>
> service0:~ # ibcheckerrors
> #warn: counter VL15Dropped = 18584     (threshold 100) lid 1 port 1
> Error check on lid 1 (r1lead HCA-1) port 1:  FAILED
> #warn: counter SymbolErrors = 42829     (threshold 10) lid 9 port 1
> #warn: counter RcvErrors = 9279     (threshold 10) lid 9 port 1
> Error check on lid 9 (service0 HCA-1) port 1:  FAILED
>
> ## Summary: 88 nodes checked, 0 bad nodes found
> ##          292 ports checked, 2 ports have errors beyond threshold
> service0:~ #
>
>
> service0:~ # ibchecknet
>
> # Checking Ca: nodeguid 0x0002c903000abfc2
>
> # Checking Ca: nodeguid 0x0002c903000ac00e
>
> # Checking Ca: nodeguid 0x0002c903000a69dc
>
> # Checking Ca: nodeguid 0x0002c9030009cd46
>
> # Checking Ca: nodeguid 0x003048fffff4d878
>
> # Checking Ca: nodeguid 0x003048fffff4d880
>
> # Checking Ca: nodeguid 0x003048fffff4d87c
>
> # Checking Ca: nodeguid 0x003048fffff4d884
>
> # Checking Ca: nodeguid 0x003048fffff4d888
>
> # Checking Ca: nodeguid 0x003048fffff4d88c
>
> # Checking Ca: nodeguid 0x003048fffff4d890
>
> # Checking Ca: nodeguid 0x003048fffff4d894
>
> # Checking Ca: nodeguid 0x0002c9020029fa50
> #warn: counter VL15Dropped = 18617     (threshold 100) lid 1 port 1
> Error check on lid 1 (r1lead HCA-1) port 1:  FAILED
>
> # Checking Ca: nodeguid 0x0002c90300054eac
>
> # Checking Ca: nodeguid 0x0002c9030009cebe
>
> # Checking Ca: nodeguid 0x003048fffff4c9f8
>
> # Checking Ca: nodeguid 0x003048fffff4db08
>
> # Checking Ca: nodeguid 0x003048fffff4db40
>
> # Checking Ca: nodeguid 0x003048fffff4db44
>
> # Checking Ca: nodeguid 0x003048fffff4db48
>
> # Checking Ca: nodeguid 0x003048fffff4db4c
>
> # Checking Ca: nodeguid 0x003048fffff4db0c
>
> # Checking Ca: nodeguid 0x003048fffff4dca0
>
> # Checking Ca: nodeguid 0x0002c903000abfe2
>
> # Checking Ca: nodeguid 0x0002c903000abfe6
>
> # Checking Ca: nodeguid 0x0002c9030009dd28
>
> # Checking Ca: nodeguid 0x003048fffff4db54
>
> # Checking Ca: nodeguid 0x003048fffff4db58
>
> # Checking Ca: nodeguid 0x003048fffff4c9f4
>
> # Checking Ca: nodeguid 0x003048fffff4db50
>
> # Checking Ca: nodeguid 0x003048fffff4db3c
>
> # Checking Ca: nodeguid 0x003048fffff4db38
>
> # Checking Ca: nodeguid 0x003048fffff4db14
>
> # Checking Ca: nodeguid 0x003048fffff4db10
>
> # Checking Ca: nodeguid 0x003048fffff4d8a8
>
> # Checking Ca: nodeguid 0x003048fffff4d8ac
>
> # Checking Ca: nodeguid 0x003048fffff4d8b4
>
> # Checking Ca: nodeguid 0x003048fffff4d8b0
>
> # Checking Ca: nodeguid 0x003048fffff4db70
>
> # Checking Ca: nodeguid 0x003048fffff4db68
>
> # Checking Ca: nodeguid 0x003048fffff4db64
>
> # Checking Ca: nodeguid 0x003048fffff4db78
>
> # Checking Ca: nodeguid 0x0002c903000a69f0
>
> # Checking Ca: nodeguid 0x0002c9030006004a
>
> # Checking Ca: nodeguid 0x0002c9030009dd2c
>
> # Checking Ca: nodeguid 0x003048fffff4d8b8
>
> # Checking Ca: nodeguid 0x003048fffff4d8bc
>
> # Checking Ca: nodeguid 0x003048fffff4d8a4
>
> # Checking Ca: nodeguid 0x003048fffff4d8a0
>
> # Checking Ca: nodeguid 0x003048fffff4db7c
>
> # Checking Ca: nodeguid 0x003048fffff4db80
>
> # Checking Ca: nodeguid 0x003048fffff4db6c
>
> # Checking Ca: nodeguid 0x003048fffff4db74
>
> # Checking Ca: nodeguid 0x003048fffff4dcb8
>
> # Checking Ca: nodeguid 0x003048fffff4dcd0
>
> # Checking Ca: nodeguid 0x003048fffff4dc5c
>
> # Checking Ca: nodeguid 0x003048fffff4dc60
>
> # Checking Ca: nodeguid 0x003048fffff4dc54
>
> # Checking Ca: nodeguid 0x003048fffff4dc50
>
> # Checking Ca: nodeguid 0x003048fffff4dc4c
>
> # Checking Ca: nodeguid 0x003048fffff4dcd4
>
> # Checking Ca: nodeguid 0x0002c903000a6164
>
> # Checking Ca: nodeguid 0x003048fffff4dcf0
>
> # Checking Ca: nodeguid 0x003048fffff4db5c
>
> # Checking Ca: nodeguid 0x003048fffff4dc90
>
> # Checking Ca: nodeguid 0x003048fffff4dc8c
>
> # Checking Ca: nodeguid 0x003048fffff4dc58
>
> # Checking Ca: nodeguid 0x003048fffff4dc94
>
> # Checking Ca: nodeguid 0x003048fffff4dc9c
>
> # Checking Ca: nodeguid 0x003048fffff4db60
>
> # Checking Ca: nodeguid 0x003048fffff4d89c
>
> # Checking Ca: nodeguid 0x003048fffff4d898
>
> # Checking Ca: nodeguid 0x003048fffff4dad8
>
> # Checking Ca: nodeguid 0x003048fffff4dadc
>
> # Checking Ca: nodeguid 0x003048fffff4db30
>
> # Checking Ca: nodeguid 0x003048fffff4db34
>
> # Checking Ca: nodeguid 0x003048fffff4d874
>
> # Checking Ca: nodeguid 0x003048fffff4d870
>
> # Checking Ca: nodeguid 0x0002c903000a6028
> #warn: counter SymbolErrors = 44150     (threshold 10) lid 9 port 1
> #warn: counter RcvErrors = 9283     (threshold 10) lid 9 port 1
> Error check on lid 9 (service0 HCA-1) port 1:  FAILED
>
> ## Summary: 88 nodes checked, 0 bad nodes found
> ##          292 ports checked, 0 bad ports found
> ##          2 ports have errors beyond threshold
>
>
>
> service0:~ # ibcheckstate
>
> ## Summary: 88 nodes checked, 0 bad nodes found
> ##          292 ports checked, 0 ports with bad state found
> service0:~ # ibcheckwidth
>
> ## Summary: 88 nodes checked, 0 bad nodes found
> ##          292 ports checked, 0 ports with 1x width in error found
> service0:~ #
>
>
> Thanks and Regards
> Ashok
>
>
>
> On 30 September 2011 12:39, Brian O'Connor <briano at sgi.com 
> <mailto:briano at sgi.com>> wrote:
>
>     Hello Ashok
>
>     is the cluster hanging or otherwise behaving badly? The logs below
>     show that the client
>     lost connection to 10.148.0.106 for 10seconds or so. It should
>     have recovered ok.
>
>     If you want further help from the list you need to add more detail
>     about the cluster i.e.
>     A general description of the number of OSS/OST, clients, version
>     of lustre etc, and a description
>     of what is actually going wrong... ie hanging, offline etc
>
>     The first thing is to check the infrastructure.. ie. in this case
>     you should check your IB network for errors
>
>
>
>
>     On 30-September-2011 2:39 PM, Ashok nulguda wrote:
>>     Dear All,
>>
>>     I am having lustre error on my HPC as given below.Please any one
>>     can help me to resolve this problem.
>>     Thanks in Advance.
>>     Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre:
>>     8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1
>>     previous similar message
>>     Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre:
>>     lustre-OST0008-osc-ffff880b272cf800: Connection to service
>>     lustre-OST0008 via nid 10.148.0.106 at o2ib was lost; in progress
>>     operations using this service will wait for recovery to complete.
>>     Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre:
>>     8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
>>     x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800
>>     to NID 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
>>     Sep 30 08:40:24 service0 kernel: [343139.837263]  
>>     req at ffff880a5f800c00 x1380984193067288/t0
>>     o3->lustre-OST0006_UUID at 10.148.0.106@o2ib:6/4
>>     <mailto:lustre-OST0006_UUID at 10.148.0.106@o2ib:6/4> lens 448/592 e
>>     0 to 1 dl 1317352224 ref 2 fl Rpc:/0/0 rc 0/0
>>     Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre:
>>     8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38
>>     previous similar messages
>>     Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError:
>>     9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
>>     from cancel RPC: canceling anyway
>>     Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError:
>>     9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
>>     previous similar message
>>     Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError:
>>     9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
>>     ldlm_cli_cancel_list: -11
>>     Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError:
>>     9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
>>     previous similar message
>>     Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre:
>>     8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
>>     x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800
>>     to NID 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
>>     Sep 30 08:40:25 service0 kernel: [343140.837311]  
>>     req at ffff880a557c4400 x1380984193067299/t0
>>     o3->lustre-OST0010_UUID at 10.148.0.106@o2ib:6/4
>>     <mailto:lustre-OST0010_UUID at 10.148.0.106@o2ib:6/4> lens 448/592 e
>>     0 to 1 dl 1317352225 ref 2 fl Rpc:/0/0 rc 0/0
>>     Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre:
>>     8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4
>>     previous similar messages
>>     Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError:
>>     30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
>>     from cancel RPC: canceling anyway
>>     Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError:
>>     22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
>>     ldlm_cli_cancel_list: -11
>>     Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError:
>>     30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
>>     previous similar message
>>     Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre:
>>     22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
>>     x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800
>>     to NID 10.148.0.106 at o2ib 14s ago has timed out (14s prior to
>>     deadline).
>>     Sep 30 08:40:33 service0 kernel: [343148.245686]  
>>     req at ffff8805c879e800 x1380984193067302/t0
>>     o103->lustre-OST0004_UUID at 10.148.0.106@o2ib:17/18
>>     <mailto:lustre-OST0004_UUID at 10.148.0.106@o2ib:17/18> lens 296/384
>>     e 0 to 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0
>>     Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre:
>>     22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2
>>     previous similar messages
>>     Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError:
>>     22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
>>     from cancel RPC: canceling anyway
>>     Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError:
>>     22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
>>     ldlm_cli_cancel_list: -11
>>     Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError:
>>     22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
>>     previous similar message
>>     Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError:
>>     11-0: an error occurred while communicating with
>>     10.148.0.106 at o2ib. The ost_connect operation failed with -16
>>     Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError:
>>     Skipped 1 previous similar message
>>     Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError:
>>     167-0: This client was evicted by lustre-OST000b; in progress
>>     operations using this service will fail.
>>     Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError:
>>     30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
>>     Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError:
>>     8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
>>     req at ffff88049528c400 x1380984193067406/t0
>>     o3->lustre-OST000b_UUID at 10.148.0.106@o2ib:6/4
>>     <mailto:lustre-OST000b_UUID at 10.148.0.106@o2ib:6/4> lens 448/592 e
>>     0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0
>>     Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre:
>>     lustre-OST0000-osc-ffff880b272cf800: Connection restored to
>>     service lustre-OST0000 using nid 10.148.0.106 at o2ib.
>>     Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre:
>>     lustre-OST0006-osc-ffff880b272cf800: Connection restored to
>>     service lustre-OST0006 using nid 10.148.0.106 at o2ib.
>>     Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre:
>>     lustre-OST0003-osc-ffff880b272cf800: Connection restored to
>>     service lustre-OST0003 using nid 10.148.0.106 at o2ib.
>>     Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped
>>     3 previous similar messages
>>
>>
>>     Thanks and Regards
>>     Ashok
>>
>>     -- 
>>     *Ashok Nulguda
>>     *
>>     *TATA ELXSI LTD*
>>     *Mb : +91 9689945767
>>     *
>>     *Email :ashokn at tataelxsi.co.in <mailto:tshrikant at tataelxsi.co.in>*
>>
>>
>>
>>     _______________________________________________
>>     Lustre-discuss mailing list
>>     Lustre-discuss at lists.lustre.org  <mailto:Lustre-discuss at lists.lustre.org>
>>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>     -- 
>     Brian O'Connor
>     -------------------------------------------------
>     SGI Consulting
>     Email:briano at sgi.com  <mailto:briano at sgi.com>, Mobile +61 417 746 452
>     Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
>     357 Camberwell Road, Camberwell, Victoria, 3124
>     AUSTRALIAhttp://www.sgi.com/support/services
>     -------------------------------------------------
>
>
>
>
>
>
> -- 
> *Ashok Nulguda
> *
> *TATA ELXSI LTD*
> *Mb : +91 9689945767
> *
> *Email :ashokn at tataelxsi.co.in <mailto:tshrikant at tataelxsi.co.in>*
>

-- 
Brian O'Connor
-------------------------------------------------
SGI Consulting
Email: briano at sgi.com, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
357 Camberwell Road, Camberwell, Victoria, 3124
AUSTRALIA http://www.sgi.com/support/services
-------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110930/458d4206/attachment.htm>