<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello Ashok<br>
<br>
is the cluster hanging or otherwise behaving badly? The logs below
show that the client<br>
lost connection to 10.148.0.106 for 10seconds or so. It should have
recovered ok.<br>
<br>
If you want further help from the list you need to add more detail
about the cluster i.e.<br>
A general description of the number of OSS/OST, clients, version of
lustre etc, and a description<br>
of what is actually going wrong... ie hanging, offline etc<br>
<br>
The first thing is to check the infrastructure.. ie. in this case
you should check your IB network for errors<br>
<br>
<br>
<br>
On 30-September-2011 2:39 PM, Ashok nulguda wrote:
<blockquote
cite="mid:CACGS=M8ABpDwb-pjM8-ktXEWOY+YcufzJ2nmUZuyJJHvv0=2Jg@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
Dear All,<br>
<br>
I am having lustre error on my HPC as given below.Please any one
can help me to resolve this problem. <br>
Thanks in Advance.<br>
Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1
previous similar message<br>
Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre:
lustre-OST0008-osc-ffff880b272cf800: Connection to service
lustre-OST0008 via nid 10.148.0.106@o2ib was lost; in progress
operations using this service will wait for recovery to complete.<br>
Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).<br>
Sep 30 08:40:24 service0 kernel: [343139.837263]
req@ffff880a5f800c00 x1380984193067288/t0
o3-><a class="moz-txt-link-abbreviated" href="mailto:lustre-OST0006_UUID@10.148.0.106@o2ib:6/4">lustre-OST0006_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 1317352224 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38
previous similar messages<br>
Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError:
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from
cancel RPC: canceling anyway<br>
Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError:
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
previous similar message<br>
Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError:
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError:
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
previous similar message<br>
Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).<br>
Sep 30 08:40:25 service0 kernel: [343140.837311]
req@ffff880a557c4400 x1380984193067299/t0
o3-><a class="moz-txt-link-abbreviated" href="mailto:lustre-OST0010_UUID@10.148.0.106@o2ib:6/4">lustre-OST0010_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 1317352225 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre:
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4
previous similar messages<br>
Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError:
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway<br>
Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError:
22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError:
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1
previous similar message<br>
Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre:
22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to
NID 10.148.0.106@o2ib 14s ago has timed out (14s prior to
deadline).<br>
Sep 30 08:40:33 service0 kernel: [343148.245686]
req@ffff8805c879e800 x1380984193067302/t0
o103-><a class="moz-txt-link-abbreviated" href="mailto:lustre-OST0004_UUID@10.148.0.106@o2ib:17/18">lustre-OST0004_UUID@10.148.0.106@o2ib:17/18</a> lens 296/384
e 0 to 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0<br>
Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre:
22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2
previous similar messages<br>
Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError:
22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway<br>
Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError:
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11<br>
Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError:
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1
previous similar message<br>
Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError:
11-0: an error occurred while communicating with
10.148.0.106@o2ib. The ost_connect operation failed with -16<br>
Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError:
Skipped 1 previous similar message<br>
Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError:
167-0: This client was evicted by lustre-OST000b; in progress
operations using this service will fail.<br>
Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError:
30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5<br>
Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError:
8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@ffff88049528c400 x1380984193067406/t0
o3-><a class="moz-txt-link-abbreviated" href="mailto:lustre-OST000b_UUID@10.148.0.106@o2ib:6/4">lustre-OST000b_UUID@10.148.0.106@o2ib:6/4</a> lens 448/592 e 0
to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0<br>
Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre:
lustre-OST0000-osc-ffff880b272cf800: Connection restored to
service lustre-OST0000 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre:
lustre-OST0006-osc-ffff880b272cf800: Connection restored to
service lustre-OST0006 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre:
lustre-OST0003-osc-ffff880b272cf800: Connection restored to
service lustre-OST0003 using nid 10.148.0.106@o2ib.<br>
Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3
previous similar messages<br>
<br>
<br>
Thanks and Regards<br>
Ashok<br clear="all">
<br>
-- <br>
<div style="margin:0in 0in 0pt"><b><font face="Cambria">Ashok
Nulguda<br>
</font></b></div>
<div style="margin:0in 0in 0pt"><b><font face="Cambria">TATA ELXSI
LTD</font></b></div>
<div style="margin:0in 0in 0pt"><span
style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span
style="font-family:'Cambria','serif'"></span></div>
<div style="margin:0in 0in 0pt"><span
style="font-family:'Cambria','serif'"><b>Mb : +91 9689945767<br>
</b></span></div>
<div style="margin:0in 0in 0pt"><span
style="font-family:'Cambria','serif'"></span><span
style="font-family:'Cambria','serif'"><font color="#0000ff"><b>Email
:<a moz-do-not-send="true"
href="mailto:tshrikant@tataelxsi.co.in" target="_blank">ashokn@tataelxsi.co.in</a></b></font></span></div>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Lustre-discuss mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a>
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Brian O'Connor
-------------------------------------------------
SGI Consulting
Email: <a class="moz-txt-link-abbreviated" href="mailto:briano@sgi.com">briano@sgi.com</a>, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
357 Camberwell Road, Camberwell, Victoria, 3124
AUSTRALIA <a class="moz-txt-link-freetext" href="http://www.sgi.com/support/services">http://www.sgi.com/support/services</a>
-------------------------------------------------
</pre>
</body>
</html>