[Lustre-discuss] strange slowdown

Aaron Knister aaron at iges.org
Thu Dec 13 15:52:32 PST 2007


Just kidding...I spoke WAY too soon. It's acting up again.

On Dec 13, 2007, at 6:51 PM, Aaron Knister wrote:

> Don't ask me how but it out of the blue resolved itself. I have 0  
> idea what went wrong...
>
> On Dec 13, 2007, at 3:12 PM, Aaron Knister wrote:
>
>> Thanks for your help! I have some more information from the lctl dk--
>>
>> 10000000:01000000:3:1197576228.177725:0:8816:0:(mgc_request.c:
>> 1130:mgc_process_log()) Can't get cfg lock: -108
>> 10000000:01000000:1:1197576228.177727:0:8511:0:(mgc_request.c:
>> 558:mgc_blocking_ast()) Lock res 0x61746164 (data)
>> 00000100:00020000:3:1197576228.177728:0:8816:0:(client.c:
>> 710:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff8103dba84c00
>> x390/t0 o501->MGS at MGC192.168.64.70@o2ib_0:26/25 lens 200/304 e 0 to  
>> 11
>> dl 0 ref 1 fl Rpc:/8/0 rc 0/0
>> 10000000:01000000:1:1197576228.177729:0:8511:0:(mgc_request.c:
>> 583:mgc_blocking_ast()) log data-OST0000: original grant failed, will
>> requeue later
>> 10000000:01000000:3:1197576228.177731:0:8816:0:(mgc_request.c:
>> 1182:mgc_process_log()) MGC192.168.64.70 at o2ib: configuration from log
>> 'data-OST0000' failed (-108).
>> 00000100:00080000:1:1197576236.900462:0:8444:0:(pinger.c:
>> 143:ptlrpc_pinger_main()) not pinging MGS (in recovery: FULL or
>> recovery disabled: 0/1)
>>
>> This is on the OSS.
>>
>> Also on the OSS --
>>
>> 00010000:00000400:2:1197576684.886679:0:8597:0:(ldlm_lib.c:
>> 515:target_handle_reconnect()) data-OST0005: 532a7ed7-8e93-e086-885a-
>> b064e46adb12
>> reconnecting00010000:00000400:2:1197576684.886683:0:8597:0: 
>> (ldlm_lib.c:
>> 744:target_handle_connect()) data-OST0005: refuse reconnection from 532a7ed7-8e93-e086-885a-b064e46adb12 at 192.168.64.102
>> @o2ib to 0xffff8103cc9e3000; st
>> ill busy with 9 active
>> RPCs00000100:00100000:1:1197576684.886683:0:8599:0:(service.c:
>> 1032:ptlrpc_server_handle_request()) Handling RPC pname:cluuid
>> +ref:pid:xid:nid:opc ll_ost_55:532a7ed7-8e93-e086-885a-
>> b064e46adb12+6:3962:x868:12345-192
>> .168.64.102 at o2ib:40000000010:00000002:1:1197576684.886687:0:8599:0:
>> (ost_handler.c:1598:ost_handle()) @@@ ping  req at ffff81042f7a3c00  
>> x868/
>> t0 o400->532a7ed7-8e93-e086-885a- 
>> b064e46adb12 at NET_0x50000c0a84066_UUID:
>> 0/0 lens 128/0 e 0 to
>> 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc
>> 0/000010000:00020000:2:1197576684.886688:0:8597:0:(ldlm_lib.c:
>> 1458:target_send_reply_msg()) @@@ processing error (-16)
>> req at ffff8104167fe850 x871/t0 o8->532a7ed7-8e93-e086-885a-
>> b064e46adb12 at NET_0x50000c0a84066_UU
>> ID:0/0 lens 304/200 e 0 to 0 dl 1197576784 ref 1 fl Interpret:/0/0 rc
>> -16/0
>>
>> On the client it shows --
>>
>> 00000100:00080000:0:1197576416.143577:0:3964:0:(recover.c:
>> 54:ptlrpc_initiate_recovery()) data-OST0004_UUID: starting recovery
>> 00000100:00080000:0:1197576416.143585:0:3964:0:(import.c:
>> 381:ptlrpc_connect_import()) ffff81082f49a000 data-OST0004_UUID:
>> changing import state from DISCONN to CONNECTING
>> 00000100:00080000:0:1197576416.143590:0:3964:0:(import.c:
>> 275:import_select_connection()) data-OST0004-osc-ffff81082ae12400:
>> connect to NID 192.168.64.71 at o2ib last attempt 4296998987
>> 00000100:00080000:0:1197576416.143597:0:3964:0:(import.c:
>> 339:import_select_connection()) data-OST0004-osc-ffff81082ae12400:
>> import ffff81082f49a000 using connection 192.168.64.71 at o2ib/
>> 192.168.64.71 at o2ib
>> 00000100:02020000:0:1197576416.143864:0:3963:0:(client.c:
>> 581:ptlrpc_check_status()) 11-0: an error occurred while  
>> communicating
>> with 192.168.64.71 at o2ib. The ost_connect operation failed with -16
>> 00000100:00080000:0:1197576416.144314:0:3963:0:(import.c:
>> 759:ptlrpc_connect_interpret()) ffff81082f49a000 data-OST0004_UUID:
>> changing import state from CONNECTING to DISCONN
>> 00000100:00080000:0:1197576416.144316:0:3963:0:(import.c:
>> 801:ptlrpc_connect_interpret()) recovery of data-OST0004_UUID on
>> 192.168.64.71 at o2ib failed (-16)
>>
>> I'm at a loss.
>>
>> On Dec 13, 2007, at 11:59 AM, Oleg Drokin wrote:
>>
>>> Hello!
>>>
>>> On Dec 13, 2007, at 11:48 AM, Aaron Knister wrote:
>>>
>>>> On the client i see this --
>>>
>>> This shows no activity aside from the fact that client is
>>> disconnected from OST5.
>>>
>>>> and on the server --
>>>
>>> This one shows that served does not allow client reconnection
>>> because it is still
>>> busy processing other requests from this client. That's the reason
>>> for "mount hang".
>>>
>>> This is all I can tell from those logs you provided. If the logs
>>> actually span
>>> long in the past, might be there is more useful info.
>>> Since there was disconnection - perhaps dmesg on client and server
>>> contain
>>> more info about the disconnection reasons, also on server if you do
>>> sysrq-t, you will see what is going on with those server threads
>>> that are supposedly
>>> still process client requests.
>>>
>>> Bye,
>>>  Oleg
>>
>> Aaron Knister
>> Associate Systems Administrator/Web Designer
>> Center for Research on Environment and Water
>>
>> (301) 595-7001
>> aaron at iges.org
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
> Aaron Knister
> Associate Systems Administrator/Web Designer
> Center for Research on Environment and Water
>
> (301) 595-7001
> aaron at iges.org
>
>
>

Aaron Knister
Associate Systems Administrator/Web Designer
Center for Research on Environment and Water

(301) 595-7001
aaron at iges.org






More information about the lustre-discuss mailing list