[Lustre-discuss] ldlm_enqueue operation failures
Charles Taylor
taylor at hpc.ufl.edu
Mon Feb 18 14:04:08 PST 2008
Well, yes. But the evictions are the result of the job trying to
start. Absent that, there are no evictions. A bunch of threads
trying to open the same file should not cause the clients to be
evicted. That's an odd way of dealing with concurrency. :)
Charlie
On Feb 18, 2008, at 4:57 PM, Oleg Drokin wrote:
> Hello!
>
> On Feb 18, 2008, at 4:55 PM, Charles Taylor wrote:
>> Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c:
>> 515:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS
>> Feb 18 15:25:50 hpcmds kernel: LustreError: 7162:0:(mgs_handler.c:
>> 515:mgs_handle()) Skipped 263 previous similar messages
>> Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c:
>> 1442:target_send_reply_msg()) @@@ processing error (-107)
>> req at ffff81011acf7c50 x1602651/t0 o101-><?>@<?>:-1 lens 232/0 ref 0
>> fl Interpret:/0/0 rc -107/0
>> Feb 18 15:29:25 hpcmds kernel: LustreError: 6057:0:(ldlm_lib.c:
>> 1442:target_send_reply_msg()) Skipped 427 previous similar messages
>> Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c:
>> 1474:mds_close()) @@@ no handle for file close ino 43116025:
>> cookie 0x1938027bf9d67349 req at ffff8100ae3bfc00 x10000789/t0 o35-
>> >beb7df79-6127-c0ca-9d36-2a96817a77a9@:-1 lens 296/1736 ref 0 fl
>> Interpret:/0/0 rc 0/0
>> Feb 18 15:31:28 hpcmds kernel: LustreError: 7150:0:(mds_open.c:
>> 1474:mds_close()) Skipped 161 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c:
>> 210:waiting_locks_callback()) ### lock callback timer expired:
>> evicting client 2bdea9d4-43c3-a0b0-2822-
>> c49ecfe6e044 at NET_0x500000a0d1935_UUID nid 10.13.25.53 at o2ib ns:
>> mds-ufhpc-MDT0000_UUID lock: ffff810053d3f100/0x688cfbc7df2ef487
>> lrc: 1/0,0 mode: CR/CR res: 21878337/3424633214 bits 0x3 rrc: 582
>> type: IBT flags: 4000030 remote: 0x95c1d2685c2c76d9 expref: 21 pid
>> 6090
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 0:0:(ldlm_lockd.c:
>> 210:waiting_locks_callback()) Skipped 3 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c:
>> 962:ldlm_handle_enqueue()) ### lock on destroyed export
>> ffff8101096ec000 ns: mds-ufhpc-MDT0000_UUID lock:
>> ffff810225fe12c0/0x688cfbc7df2ef505 lrc: 2/0,0 mode: CR/CR res:
>> 21878337/3424633214 bits 0x3 rrc: 579 type: IBT flags: 4000030
>> remote: 0x95c1d2685c2c76e0 expref: 6 pid 6265
>> Feb 18 15:33:17 hpcmds kernel: LustreError: 6265:0:(ldlm_lockd.c:
>> 962:ldlm_handle_enqueue()) Skipped 3 previous similar messages
>> Feb 18 15:33:17 hpcmds kernel: Lustre: 6061:0:(mds_reint.c:
>> 127:mds_finish_transno()) commit transaction for disconnected
>> client 2bdea9d4-43c3-a0b0-2822-c49ecfe6e044: rc 0
>
> This looks like in the middle of eviction storm, and by this point
> MDS and MGS anlready evicted tons of clients for unknown reasons
> (should be in the log before those messages).
>
> Bye,
> Oleg
More information about the lustre-discuss
mailing list