[Lustre-discuss] Odd performance issue with 1.4.x OSS ...

Steden Klaus Klaus.Steden at thomson.net
Fri Oct 31 13:02:43 PDT 2008


Hi folks,

Our Lustre started exhibiting some curious performance issues today ... basically, it slowed down dramatically and reliable I/O performance became impossible. I looked through the output of dmesg and saw a number of kernel 'oops' messages, but not being a Lustre kernel expert, I'm not exactly sure what they indicate. I stopped the OSTs on the node in question and ran e2fsck on the OST drives, but they've come up clean so I don't think it's a hardware problem. I don't have physical access to the machine right now so it may in fact be something on the back end, but I'm working on verifying that with a technician on site. In the meantime ... can anyone help decipher this for me? There are a couple of messages like it:

-- cut --
ll_ost_215    S 00000100d2141808     0  8584      1          8585  8583 (L-TLB)
00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8 
       00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0 
       0000000000000013 0000000000000000 
Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} 
       <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} 
       <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932} 
       <ffffffffa054227d>{:ptlrpc:l_has_lock+77} <ffffffffa056f76c>{:ptlrpc:ldlm_add_waiting_lock+2156} 
       <ffffffff80137be0>{default_wake_function+0} <ffffffffa0592620>{:ptlrpc:expired_request+0} 
       <ffffffffa05926e0>{:ptlrpc:interrupted_request+0} <ffffffffa0574ae8>{:ptlrpc:ldlm_server_blocking_ast+4072} 
       <ffffffffa0547f5a>{:ptlrpc:ldlm_run_ast_work+234} <ffffffffa05421f8>{:ptlrpc:l_unlock+248} 
       <ffffffffa055ac73>{:ptlrpc:ldlm_process_extent_lock+995} 
       <ffffffffa054bb4f>{:ptlrpc:ldlm_lock_enqueue+1087} 
       <ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0} 
       <ffffffffa0575b50>{:ptlrpc:ldlm_server_glimpse_ast+0} 
       <ffffffffa0578694>{:ptlrpc:ldlm_handle_enqueue+8836} 
       <ffffffffa0677485>{:obdfilter:filter_fmd_expire+53} 
       <ffffffffa0611dc0>{:kviblnd:kibnal_cq_callback+0} <ffffffffa0573b00>{:ptlrpc:ldlm_server_blocking_ast+0} 
       <ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0} 
       <ffffffffa062cbb2>{:ost:ost_handle+20738} <ffffffff801f78d9>{number+233} 
       <ffffffff801f7f09>{vsnprintf+1321} <ffffffffa048e022>{:libcfs:libcfs_debug_msg+1554} 
       <ffffffffa05a593a>{:ptlrpc:ptlrpc_server_handle_request+3066} 
       <ffffffff80116d35>{do_gettimeofday+101} <ffffffffa049014e>{:libcfs:lcw_update_time+30} 
       <ffffffff801454bd>{__mod_timer+317} <ffffffffa05a6d17>{:ptlrpc:ptlrpc_main+2375} 
       <ffffffff80137be0>{default_wake_function+0} <ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0} 
       <ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff801114ab>{child_rip+8} 
       <ffffffffa05a63d0>{:ptlrpc:ptlrpc_main+0} <ffffffff801114a3>{child_rip+0} 
-- cut --

I can include debug logs as well, if needed. Any help is greatly appreciated.

cheers,
Klaus



More information about the lustre-discuss mailing list