[Lustre-discuss] help

Colin Faber Colin_Faber at xyratex.com
Fri Sep 30 07:46:48 PDT 2011


Hi,

Looks like connection timeout, likely temporary as it appears to have 
reconnected and recovered without any problems.

What other issue are you experiencing?

-cf


On 09/29/2011 10:39 PM, Ashok nulguda wrote:
> Dear All,
>
> I am having lustre error on my HPC as given below.Please any one can 
> help me to resolve this problem.
> Thanks in Advance.
> Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 previous 
> similar message
> Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre: 
> lustre-OST0008-osc-ffff880b272cf800: Connection to service 
> lustre-OST0008 via nid 10.148.0.106 at o2ib was lost; in progress 
> operations using this service will wait for recovery to complete.
> Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
> Sep 30 08:40:24 service0 kernel: [343139.837263]   
> req at ffff880a5f800c00 x1380984193067288/t0 
> o3->lustre-OST0006_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 1317352224 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38 previous 
> similar messages
> Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError: 
> 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError: 
> 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous 
> similar message
> Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError: 
> 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError: 
> 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous 
> similar message
> Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
> Sep 30 08:40:25 service0 kernel: [343140.837311]   
> req at ffff880a557c4400 x1380984193067299/t0 
> o3->lustre-OST0010_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 1317352225 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4 previous 
> similar messages
> Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError: 
> 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError: 
> 22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError: 
> 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous 
> similar message
> Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre: 
> 22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 14s ago has timed out (14s prior to deadline).
> Sep 30 08:40:33 service0 kernel: [343148.245686]   
> req at ffff8805c879e800 x1380984193067302/t0 
> o103->lustre-OST0004_UUID at 10.148.0.106@o2ib:17/18 lens 296/384 e 0 to 
> 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0
> Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre: 
> 22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2 previous 
> similar messages
> Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError: 
> 22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError: 
> 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError: 
> 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 
> previous similar message
> Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError: 11-0: an 
> error occurred while communicating with 10.148.0.106 at o2ib. The 
> ost_connect operation failed with -16
> Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError: Skipped 
> 1 previous similar message
> Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError: 167-0: 
> This client was evicted by lustre-OST000b; in progress operations 
> using this service will fail.
> Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError: 
> 30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
> Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError: 
> 8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  
> req at ffff88049528c400 x1380984193067406/t0 
> o3->lustre-OST000b_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 0 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre: 
> lustre-OST0000-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0000 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre: 
> lustre-OST0006-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0006 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre: 
> lustre-OST0003-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0003 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3 
> previous similar messages
>
>
> Thanks and Regards
> Ashok
>
> -- 
> *Ashok Nulguda
> *
> *TATA ELXSI LTD*
> *Mb : +91 9689945767
> *
> *Email :ashokn at tataelxsi.co.in <mailto:tshrikant at tataelxsi.co.in>*
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
______________________________________________________________________
This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it.
 
Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan.
______________________________________________________________________
 




More information about the lustre-discuss mailing list