[Lustre-discuss] ldlm_enqueue operation failures

Charles Taylor taylor at hpc.ufl.edu
Mon Feb 18 14:04:08 PST 2008


Well, yes.   But the evictions are the result of the job trying to  
start.   Absent that, there are no evictions.    A bunch of threads  
trying to open the same file should not cause the clients to be  
evicted.    That's an odd way of dealing with concurrency.  :)

Charlie

On Feb 18, 2008, at 4:57 PM, Oleg Drokin wrote:

> Hello!
>
> On Feb 18, 2008, at 4:55 PM, Charles Taylor wrote:
>> Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
>> 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
>> Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c: 
>> 515:mgs_handle()) Skipped 263 previous similar messages
>> Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
>> 1442:target_send_reply_msg()) @@@ processing error (-107)   
>> req at ffff81011acf7c50 x1602651/t0 o101-><?>@<?>:-1 lens 232/0 ref 0  
>> fl Interpret:/0/0 rc -107/0
>> Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c: 
>> 1442:target_send_reply_msg()) Skipped 427 previous similar messages
>> Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
>> 1474:mds_close()) @@@ no handle for file close ino 43116025:  
>> cookie 0x1938027bf9d67349  req at ffff8100ae3bfc00 x10000789/t0 o35- 
>> >beb7df79-6127-c0ca-9d36-2a96817a77a9@:-1 lens 296/1736 ref 0 fl  
>> Interpret:/0/0 rc 0/0
>> Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c: 
>> 1474:mds_close()) Skipped 161 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
>> 210:waiting_locks_callback()) ### lock callback timer expired:  
>> evicting client 2bdea9d4-43c3-a0b0-2822- 
>> c49ecfe6e044 at NET_0x500000a0d1935_UUID nid 10.13.25.53 at o2ib  ns:  
>> mds-ufhpc-MDT0000_UUID lock: ffff810053d3f100/0x688cfbc7df2ef487  
>> lrc: 1/0,0 mode: CR/CR res: 21878337/3424633214 bits 0x3 rrc: 582  
>> type: IBT flags: 4000030 remote: 0x95c1d2685c2c76d9 expref: 21 pid  
>> 6090
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c: 
>> 210:waiting_locks_callback()) Skipped 3 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
>> 962:ldlm_handle_enqueue()) ### lock on destroyed export  
>> ffff8101096ec000 ns: mds-ufhpc-MDT0000_UUID lock:  
>> ffff810225fe12c0/0x688cfbc7df2ef505 lrc: 2/0,0 mode: CR/CR res:  
>> 21878337/3424633214 bits 0x3 rrc: 579 type: IBT flags: 4000030  
>> remote: 0x95c1d2685c2c76e0 expref: 6 pid 6265
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c: 
>> 962:ldlm_handle_enqueue()) Skipped 3 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: Lustre: 6061:0:(mds_reint.c: 
>> 127:mds_finish_transno()) commit transaction for disconnected  
>> client 2bdea9d4-43c3-a0b0-2822-c49ecfe6e044: rc 0
>
> This looks like in the middle of eviction storm, and by this point  
> MDS and MGS anlready evicted tons of clients for unknown reasons  
> (should be in the log before those messages).
>
> Bye,
>     Oleg




More information about the lustre-discuss mailing list