[Lustre-discuss] Process accessing Lustre be killed onLustreclient

Lu Wang wanglu at ihep.ac.cn
Mon Mar 2 19:32:16 PST 2009


Dear list , 
When I sent testjobs( dd 5G files, 8jobs /node) to these nodes, I got errors like :
Mar  3 11:14:48 bws0091 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.66 at tcp. The obd_ping operation failed with -107
Mar  3 11:14:48 bws0091 kernel: LustreError: Skipped 69 previous similar messages
Mar  3 11:14:48 bws0091 kernel: LustreError: 167-0: This client was evicted by besfs-OST0010; in progress operations using this service will fail.
Mar  3 11:15:51 bws0091 kernel: LustreError: 4959:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.50.32 at tcp, match 4016570 length 1408 too big: 1008 left, 1008 allowed
Mar  3 11:27:17 bws0091 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.66 at tcp. The obd_ping operation failed with -107
Mar  3 11:27:17 bws0091 kernel: LustreError: Skipped 66 previous similar messages
Mar  3 11:27:17 bws0091 kernel: LustreError: 167-0: This client was evicted by besfs-OST0010; in progress operations using this service will fail.

------------------				 
Lu Wang
2009-03-03

-------------------------------------------------------------
发件人:Lu Wang
发送日期:2009-03-03 10:14:58
收件人:
抄送:lustre-discuss
主题:Re: [Lustre-discuss] Process accessing Lustre be killed onLustreclient

# lctl get_param ldlm.namespaces.*osc*.lru_size
ldlm.namespaces.besfs-OST0000-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0001-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0002-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0003-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0004-osc-f7dfe400.lru_size=1
ldlm.namespaces.besfs-OST0005-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0006-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0007-osc-f7dfe400.lru_size=1
ldlm.namespaces.besfs-OST0008-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST0009-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST000a-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST000b-osc-f7dfe400.lru_size=0
ldlm.namespaces.besfs-OST000c-osc-f7dfe400.lru_size=0
....
I got "0" for lru_size, according to the Lustre manual, it means "automatic resizing...". Is the memory pressusre caused by 
uncontrolled lru size?
----------------				 
Lu Wang
2009-03-03

-------------------------------------------------------------
发件人:Johann Lombardi
发送日期:2009-03-02 18:04:13
收件人:Lu Wang
抄送:lustre-discuss
主题:Re: [Lustre-discuss] Process accessing Lustre be killed on Lustreclient

On Mon, Mar 02, 2009 at 04:10:46PM +0800, Lu Wang wrote:
> My question is: 
> 1.Does Lustre client requires a lot of low memory?

There is one known issue with the lru resize feature on i686 (it can
consume almost all the low memory). To know whether or not this is the
same problem, could you please try to disable lru resize on the client side
and see if you hit this bug again? To do so, you have to run the following
commands on the client(s):
lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))
lctl set_param ldlm.namespaces.*mdc*.lru_size=$((NR_CPU*100))
where NR_CPU is the number of cpus on the client.

Cheers,
Johann

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


More information about the lustre-discuss mailing list