Dear Sir,<br><br><br>Thanks for your help.<br><br>My system is ICE 8400 cluster with 30 TB of lustre of 64 node.<br>oss1:~ # df -h <br>Filesystem Size Used Avail Use% Mounted on<br>/dev/sda3 100G 5.8G 95G 6% /<br>
tmpfs 12G 1.1M 12G 1% /dev<br>tmpfs 12G 88K 12G 1% /dev/shm<br>/dev/sda1 1020M 181M 840M 18% /boot<br>/dev/sda4 170G 6.6M 170G 1% /data1<br>/dev/mapper/3600a0b8000755ee0000010964dc231bc_part1<br>
2.1T 74G 1.9T 4% /OST1<br>/dev/mapper/3600a0b8000755ed1000010614dc23425_part1<br> 1.7T 67G 1.5T 5% /OST4<br>/dev/mapper/3600a0b8000755ee0000010a04dc23323_part1<br> 2.1T 67G 1.9T 4% /OST5<br>
/dev/mapper/3600a0b8000755f1f000011224dc239d7_part1<br> 1.7T 67G 1.5T 5% /OST8<br>/dev/mapper/3600a0b8000755dbe000010de4dc23997_part1<br> 2.1T 66G 1.9T 4% /OST9<br>/dev/mapper/3600a0b8000755f1f000011284dc23b5a_part1<br>
1.7T 66G 1.5T 5% /OST12<br>/dev/mapper/3600a0b8000755eb3000011304dc23db1_part1<br> 2.1T 66G 1.9T 4% /OST13<br>/dev/mapper/3600a0b8000755f22000011104dc23ec7_part1<br> 1.7T 66G 1.5T 5% /OST16<br>
<br><br>oss1:~ # rpm -qa | grep -i lustre<br>kernel-default-2.6.27.39-0.3_lustre.1.8.4<br>kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default<br>lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>kernel-default-base-2.6.27.39-0.3_lustre.1.8.4<br>
lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default<br><br><br>oss2:~ # Filesystem Size Used Avail Use% Mounted on<br>/dev/sdcw3 100G 8.3G 92G 9% /<br>
tmpfs 12G 1.1M 12G 1% /dev<br>tmpfs 12G 88K 12G 1% /dev/shm<br>/dev/sdcw1 1020M 144M 876M 15% /boot<br>/dev/sdcw4 170G 13M 170G 1% /data1<br>/dev/mapper/3600a0b8000755ed10000105e4dc23397_part1<br>
1.7T 69G 1.5T 5% /OST2<br>/dev/mapper/3600a0b8000755ee00000109b4dc232a0_part1<br> 2.1T 68G 1.9T 4% /OST3<br>/dev/mapper/3600a0b8000755ed1000010644dc2349f_part1<br> 1.7T 67G 1.5T 5% /OST6<br>
/dev/mapper/3600a0b8000755dbe000010d94dc23873_part1<br> 2.1T 67G 1.9T 4% /OST7<br>/dev/mapper/3600a0b8000755f1f000011254dc23add_part1<br> 1.7T 66G 1.5T 5% /OST10<br>/dev/mapper/3600a0b8000755dbe000010e34dc23a09_part1<br>
2.1T 66G 1.9T 4% /OST11<br>/dev/mapper/3600a0b8000755f220000110d4dc23e36_part1<br> 1.7T 66G 1.5T 5% /OST14<br>/dev/mapper/3600a0b8000755eb3000011354dc23e39_part1<br> 2.1T 66G 1.9T 4% /OST15<br>
/dev/mapper/3600a0b8000755eb30000113a4dc23ec4_part1<br> 1.4T 66G 1.3T 6% /OST17<br><br>[1]+ Done df -h<br><br>oss2:~ # rpm -qa | grep -i lustre<br>lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>
kernel-default-base-2.6.27.39-0.3_lustre.1.8.4<br>kernel-default-2.6.27.39-0.3_lustre.1.8.4<br>kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default<br>lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default<br>lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>
<br>mdc1:~ # Filesystem Size Used Avail Use% Mounted on<br>/dev/sde2 100G 5.2G 95G 6% /<br>tmpfs 12G 184K 12G 1% /dev<br>tmpfs 12G 88K 12G 1% /dev/shm<br>
/dev/sde1 1020M 181M 840M 18% /boot<br>/dev/sde4 167G 196M 159G 1% /data1<br>/dev/mapper/3600a0b8000755f22000011134dc23f7e_part1<br> 489G 2.3G 458G 1% /MDC<br><br>[1]+ Done df -h<br>
mdc1:~ # <br><br><br>mdc1:~ # rpm -qa | grep -i lustre<br>lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>kernel-default-2.6.27.39-0.3_lustre.1.8.4<br>lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default<br>kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default<br>
lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>kernel-default-base-2.6.27.39-0.3_lustre.1.8.4<br>mdc1:~ # <br><br>mdc2:~ # Filesystem Size Used Avail Use% Mounted on<br>/dev/sde3 100G 5.0G 95G 5% /<br>
tmpfs 18G 184K 18G 1% /dev<br>tmpfs 7.8G 88K 7.8G 1% /dev/shm<br>/dev/sde1 1020M 144M 876M 15% /boot<br>/dev/sde4 170G 6.6M 170G 1% /data1<br><br>[1]+ Done df -h<br>
mdc2:~ # rpm -qqa | grep -i lustre<br>lustre-modules-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>kernel-default-base-2.6.27.39-0.3_lustre.1.8.4<br>kernel-default-2.6.27.39-0.3_lustre.1.8.4<br>lustre-ldiskfs-3.1.3-2.6.27_39_0.3_lustre.1.8.4_default<br>
kernel-ib-1.5.1-2.6.27.39_0.3_lustre.1.8.4_default<br>lustre-1.8.4-2.6.27_39_0.3_lustre.1.8.4_default<br>mdc2:~ # <br><br><br>service0:~ # ibstat<br>CA 'mlx4_0'<br> CA type: MT26428<br> Number of ports: 2<br>
Firmware version: 2.7.0<br> Hardware version: a0<br> Node GUID: 0x0002c903000a6028<br> System image GUID: 0x0002c903000a602b<br> Port 1:<br> State: Active<br> Physical state: LinkUp<br> Rate: 40<br>
Base lid: 9<br> LMC: 0<br> SM lid: 1<br> Capability mask: 0x02510868<br> Port GUID: 0x0002c903000a6029<br> Port 2:<br> State: Active<br> Physical state: LinkUp<br> Rate: 40<br>
Base lid: 10<br> LMC: 0<br> SM lid: 1<br> Capability mask: 0x02510868<br> Port GUID: 0x0002c903000a602a<br>service0:~ # <br><br><br><br>service0:~ # ibstatus <br>Infiniband device 'mlx4_0' port 1 status:<br>
default gid: fec0:0000:0000:0000:0002:c903:000a:6029<br> base lid: 0x9<br> sm lid: 0x1<br> state: 4: ACTIVE<br> phys state: 5: LinkUp<br> rate: 40 Gb/sec (4X QDR)<br>
<br>Infiniband device 'mlx4_0' port 2 status:<br> default gid: fec0:0000:0000:0000:0002:c903:000a:602a<br> base lid: 0xa<br> sm lid: 0x1<br> state: 4: ACTIVE<br> phys state: 5: LinkUp<br>
rate: 40 Gb/sec (4X QDR)<br><br>service0:~ # <br><br><br><br>service0:~ # ibdiagnet <br>Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2<br>-W- Topology file is not specified.<br> Reports regarding cluster links will use direct routes.<br>
Loading IBDM from: /usr/lib64/ibdm1.2<br>-W- A few ports of local device are up.<br> Since port-num was not specified (-p option), port 1 of device 1 will be<br> used as the local port.<br>-I- Discovering ... 88 nodes (9 Switches & 79 CA-s) discovered.<br>
<br><br>-I---------------------------------------------------<br>-I- Bad Guids/LIDs Info<br>-I---------------------------------------------------<br>-I- No bad Guids were found<br><br>-I---------------------------------------------------<br>
-I- Links With Logical State = INIT<br>-I---------------------------------------------------<br>-I- No bad Links (with logical state = INIT) were found<br><br>-I---------------------------------------------------<br>-I- PM Counters Info<br>
-I---------------------------------------------------<br>-I- No illegal PM counters values were found<br><br>-I---------------------------------------------------<br>-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)<br>
-I---------------------------------------------------<br>-I- PKey:0x7fff Hosts:81 full:81 partial:0<br><br>-I---------------------------------------------------<br>-I- IPoIB Subnets Check<br>-I---------------------------------------------------<br>
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00<br>-W- Suboptimal rate for group. Lowest member rate:20Gbps > group-rate:10Gbps<br><br>-I---------------------------------------------------<br>
-I- Bad Links Info<br>-I- No bad link were found<br>-I---------------------------------------------------<br>----------------------------------------------------------------<br>-I- Stages Status Report:<br> STAGE Errors Warnings<br>
Bad GUIDs/LIDs Check 0 0 <br> Link State Active Check 0 0 <br> Performance Counters Report 0 0 <br> Partitions Check 0 0 <br>
IPoIB Subnets Check 0 1 <br><br>Please see /tmp/ibdiagnet.log for complete log<br>----------------------------------------------------------------<br> <br>-I- Done. Run time was 9 seconds.<br>
service0:~ # <br><br><br>service0:~ # ibcheckerrors <br>#warn: counter VL15Dropped = 18584 (threshold 100) lid 1 port 1<br>Error check on lid 1 (r1lead HCA-1) port 1: FAILED <br>#warn: counter SymbolErrors = 42829 (threshold 10) lid 9 port 1<br>
#warn: counter RcvErrors = 9279 (threshold 10) lid 9 port 1<br>Error check on lid 9 (service0 HCA-1) port 1: FAILED <br><br>## Summary: 88 nodes checked, 0 bad nodes found<br>## 292 ports checked, 2 ports have errors beyond threshold<br>
service0:~ # <br><br><br>service0:~ # ibchecknet <br><br># Checking Ca: nodeguid 0x0002c903000abfc2<br><br># Checking Ca: nodeguid 0x0002c903000ac00e<br><br># Checking Ca: nodeguid 0x0002c903000a69dc<br><br># Checking Ca: nodeguid 0x0002c9030009cd46<br>
<br># Checking Ca: nodeguid 0x003048fffff4d878<br><br># Checking Ca: nodeguid 0x003048fffff4d880<br><br># Checking Ca: nodeguid 0x003048fffff4d87c<br><br># Checking Ca: nodeguid 0x003048fffff4d884<br><br># Checking Ca: nodeguid 0x003048fffff4d888<br>
<br># Checking Ca: nodeguid 0x003048fffff4d88c<br><br># Checking Ca: nodeguid 0x003048fffff4d890<br><br># Checking Ca: nodeguid 0x003048fffff4d894<br><br># Checking Ca: nodeguid 0x0002c9020029fa50<br>#warn: counter VL15Dropped = 18617 (threshold 100) lid 1 port 1<br>
Error check on lid 1 (r1lead HCA-1) port 1: FAILED <br><br># Checking Ca: nodeguid 0x0002c90300054eac<br><br># Checking Ca: nodeguid 0x0002c9030009cebe<br><br># Checking Ca: nodeguid 0x003048fffff4c9f8<br><br># Checking Ca: nodeguid 0x003048fffff4db08<br>
<br># Checking Ca: nodeguid 0x003048fffff4db40<br><br># Checking Ca: nodeguid 0x003048fffff4db44<br><br># Checking Ca: nodeguid 0x003048fffff4db48<br><br># Checking Ca: nodeguid 0x003048fffff4db4c<br><br># Checking Ca: nodeguid 0x003048fffff4db0c<br>
<br># Checking Ca: nodeguid 0x003048fffff4dca0<br><br># Checking Ca: nodeguid 0x0002c903000abfe2<br><br># Checking Ca: nodeguid 0x0002c903000abfe6<br><br># Checking Ca: nodeguid 0x0002c9030009dd28<br><br># Checking Ca: nodeguid 0x003048fffff4db54<br>
<br># Checking Ca: nodeguid 0x003048fffff4db58<br><br># Checking Ca: nodeguid 0x003048fffff4c9f4<br><br># Checking Ca: nodeguid 0x003048fffff4db50<br><br># Checking Ca: nodeguid 0x003048fffff4db3c<br><br># Checking Ca: nodeguid 0x003048fffff4db38<br>
<br># Checking Ca: nodeguid 0x003048fffff4db14<br><br># Checking Ca: nodeguid 0x003048fffff4db10<br><br># Checking Ca: nodeguid 0x003048fffff4d8a8<br><br># Checking Ca: nodeguid 0x003048fffff4d8ac<br><br># Checking Ca: nodeguid 0x003048fffff4d8b4<br>
<br># Checking Ca: nodeguid 0x003048fffff4d8b0<br><br># Checking Ca: nodeguid 0x003048fffff4db70<br><br># Checking Ca: nodeguid 0x003048fffff4db68<br><br># Checking Ca: nodeguid 0x003048fffff4db64<br><br># Checking Ca: nodeguid 0x003048fffff4db78<br>
<br># Checking Ca: nodeguid 0x0002c903000a69f0<br><br># Checking Ca: nodeguid 0x0002c9030006004a<br><br># Checking Ca: nodeguid 0x0002c9030009dd2c<br><br># Checking Ca: nodeguid 0x003048fffff4d8b8<br><br># Checking Ca: nodeguid 0x003048fffff4d8bc<br>
<br># Checking Ca: nodeguid 0x003048fffff4d8a4<br><br># Checking Ca: nodeguid 0x003048fffff4d8a0<br><br># Checking Ca: nodeguid 0x003048fffff4db7c<br><br># Checking Ca: nodeguid 0x003048fffff4db80<br><br># Checking Ca: nodeguid 0x003048fffff4db6c<br>
<br># Checking Ca: nodeguid 0x003048fffff4db74<br><br># Checking Ca: nodeguid 0x003048fffff4dcb8<br><br># Checking Ca: nodeguid 0x003048fffff4dcd0<br><br># Checking Ca: nodeguid 0x003048fffff4dc5c<br><br># Checking Ca: nodeguid 0x003048fffff4dc60<br>
<br># Checking Ca: nodeguid 0x003048fffff4dc54<br><br># Checking Ca: nodeguid 0x003048fffff4dc50<br><br># Checking Ca: nodeguid 0x003048fffff4dc4c<br><br># Checking Ca: nodeguid 0x003048fffff4dcd4<br><br># Checking Ca: nodeguid 0x0002c903000a6164<br>
<br># Checking Ca: nodeguid 0x003048fffff4dcf0<br><br># Checking Ca: nodeguid 0x003048fffff4db5c<br><br># Checking Ca: nodeguid 0x003048fffff4dc90<br><br># Checking Ca: nodeguid 0x003048fffff4dc8c<br><br># Checking Ca: nodeguid 0x003048fffff4dc58<br>
<br># Checking Ca: nodeguid 0x003048fffff4dc94<br><br># Checking Ca: nodeguid 0x003048fffff4dc9c<br><br># Checking Ca: nodeguid 0x003048fffff4db60<br><br># Checking Ca: nodeguid 0x003048fffff4d89c<br><br># Checking Ca: nodeguid 0x003048fffff4d898<br>
<br># Checking Ca: nodeguid 0x003048fffff4dad8<br><br># Checking Ca: nodeguid 0x003048fffff4dadc<br><br># Checking Ca: nodeguid 0x003048fffff4db30<br><br># Checking Ca: nodeguid 0x003048fffff4db34<br><br># Checking Ca: nodeguid 0x003048fffff4d874<br>
<br># Checking Ca: nodeguid 0x003048fffff4d870<br><br># Checking Ca: nodeguid 0x0002c903000a6028<br>#warn: counter SymbolErrors = 44150 (threshold 10) lid 9 port 1<br>#warn: counter RcvErrors = 9283 (threshold 10) lid 9 port 1<br>
Error check on lid 9 (service0 HCA-1) port 1: FAILED <br><br>## Summary: 88 nodes checked, 0 bad nodes found<br>## 292 ports checked, 0 bad ports found<br>## 2 ports have errors beyond threshold<br><br>
<br><br>service0:~ # ibcheckstate<br><br>## Summary: 88 nodes checked, 0 bad nodes found<br>## 292 ports checked, 0 ports with bad state found<br>service0:~ # ibcheckwidth<br><br>## Summary: 88 nodes checked, 0 bad nodes found<br>
## 292 ports checked, 0 ports with 1x width in error found<br>service0:~ # <br><br><br>Thanks and Regards<br>Ashok<br><br><br><br><div class="gmail_quote">On 30 September 2011 12:39, Brian O'Connor <span dir="ltr"><<a href="mailto:briano@sgi.com">briano@sgi.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div bgcolor="#FFFFFF" text="#000000">
Hello Ashok<br>
<br>
is the cluster hanging or otherwise behaving badly? The logs below
show that the client<br>
lost connection to 10.148.0.106 for 10seconds or so. It should have
recovered ok.<br>
<br>
If you want further help from the list you need to add more detail
about the cluster i.e.<br>
A general description of the number of OSS/OST, clients, version of
lustre etc, and a description<br>
of what is actually going wrong... ie hanging, offline etc<br>
<br>
The first thing is to check the infrastructure.. ie. in this case
you should check your IB network for errors<div><div></div><div class="h5"><br>
<br>
<br>
<br>
On 30-September-2011 2:39 PM, Ashok nulguda wrote:
</div></div><blockquote type="cite"><div><div></div><div class="h5">
Dear All,<br>
<br>
I am having lustre error on my HPC as given below.Please any one
can help me to resolve this problem. <br>
Thanks in Advance.<br>
Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1
previous similar message<br>
Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre:
lustre-OST0008-osc-ffff880b272cf800: Connection to service
lustre-OST0008 via nid 10.148.0.106@o2ib was lost; in progress
operations using this service will wait for recovery to complete.<br>
Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).<br>
Sep 30 08:40:24 service0 kernel: [343139.837263]
req@ffff880a5f800c00 x1380984193067288/t0
o3-><a href="mailto:lustre-OST0006_UUID@10.148.0.106@o2ib:6/4" target="_blank">lustre-OST0006_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 1317352224 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38
previous similar messages<br>
Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError:
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from
cancel RPC: canceling anyway<br>
Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError:
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
previous similar message<br>
Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError:
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError:
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
previous similar message<br>
Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).<br>
Sep 30 08:40:25 service0 kernel: [343140.837311]
req@ffff880a557c4400 x1380984193067299/t0
o3-><a href="mailto:lustre-OST0010_UUID@10.148.0.106@o2ib:6/4" target="_blank">lustre-OST0010_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 1317352225 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4
previous similar messages<br>
Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError:
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway<br>
Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError:
22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError:
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
previous similar message<br>
Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre:
22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 14s ago has timed out (14s prior to
deadline).<br>
Sep 30 08:40:33 service0 kernel: [343148.245686]
req@ffff8805c879e800 x1380984193067302/t0
o103-><a href="mailto:lustre-OST0004_UUID@10.148.0.106@o2ib:17/18" target="_blank">lustre-OST0004_UUID@10.148.0.106@o2ib:17/18</a> lens 296/384
e 0 to 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0<br>
Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre:
22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2
previous similar messages<br>
Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError:
22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway<br>
Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError:
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError:
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
previous similar message<br>
Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError:
11-0: an error occurred while communicating with
10.148.0.106@o2ib. The ost_connect operation failed with -16<br>
Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError:
Skipped 1 previous similar message<br>
Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError:
167-0: This client was evicted by lustre-OST000b; in progress
operations using this service will fail.<br>
Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError:
30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5<br>
Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError:
8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@ffff88049528c400 x1380984193067406/t0
o3-><a href="mailto:lustre-OST000b_UUID@10.148.0.106@o2ib:6/4" target="_blank">lustre-OST000b_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre:
lustre-OST0000-osc-ffff880b272cf800: Connection restored to
service lustre-OST0000 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre:
lustre-OST0006-osc-ffff880b272cf800: Connection restored to
service lustre-OST0006 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre:
lustre-OST0003-osc-ffff880b272cf800: Connection restored to
service lustre-OST0003 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3
previous similar messages<br>
<br>
<br>
Thanks and Regards<br>
Ashok<br clear="all">
<br>
-- <br>
<div style="margin:0in 0in 0pt"><b><font face="Cambria">Ashok
Nulguda<br>
</font></b></div>
<div style="margin:0in 0in 0pt"><b><font face="Cambria">TATA ELXSI
LTD</font></b></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"><b>Mb : +91 9689945767<br>
</b></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span><span style="font-family:'Cambria','serif'"><font color="#0000ff"><b>Email
:<a href="mailto:tshrikant@tataelxsi.co.in" target="_blank">ashokn@tataelxsi.co.in</a></b></font></span></div>
<br>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
Lustre-discuss mailing list
<a href="mailto:Lustre-discuss@lists.lustre.org" target="_blank">Lustre-discuss@lists.lustre.org</a>
<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a>
</pre>
</blockquote><font color="#888888">
<br>
<br>
<pre cols="72">--
Brian O'Connor
-------------------------------------------------
SGI Consulting
Email: <a href="mailto:briano@sgi.com" target="_blank">briano@sgi.com</a>, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
357 Camberwell Road, Camberwell, Victoria, 3124
AUSTRALIA <a href="http://www.sgi.com/support/services" target="_blank">http://www.sgi.com/support/services</a>
-------------------------------------------------
</pre>
</font></div>
</blockquote></div><br><br clear="all"><br>-- <br><div style="margin:0in 0in 0pt"><b><font face="Cambria">Ashok Nulguda<br></font></b></div>
<div style="margin:0in 0in 0pt"><b><font face="Cambria">TATA ELXSI LTD</font></b></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"><b>Mb : +91 9689945767<br></b></span></div>
<div style="margin:0in 0in 0pt"><span style="font-family:'Cambria','serif'"></span><span style="font-family:'Cambria','serif'"><font color="#0000ff"><b>Email :<a href="mailto:tshrikant@tataelxsi.co.in" target="_blank">ashokn@tataelxsi.co.in</a></b></font></span></div>
<br>