[Lustre-discuss] Nodes got evicting
thhsieh
thhsieh at piano.rcas.sinica.edu.tw
Mon Dec 1 23:18:50 PST 2008
Dear All,
We have a cluster (48 nodes) with Linux kernel 2.6.22.19 + Lustre 1.6.5.1
installed. After several weeks of testing run, we often encounter the
following error messages:
In MGS & MDT node:
------------------
LustreError: 1833:0:(mds_open.c:1482:mds_close()) @@@ no handle for file close ino 60872818: cookie 0xc1cbe239fd9b0121 req at ffff8103f342ca00 x2996825/t0o35->a9c4c52e-e0cb-a7bc-3810-71be88e4367f at NET_0x20000c0a80b0c_UUID:0/0 lens 296/672 e 0 to 0 dl 1227939526 ref 1 fl Interpret:/0/0 rc 0/0
LustreError: 1833:0:(mds_open.c:1482:mds_close()) Skipped 6 previous similar messages
LustreError: 1833:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@processing error (-116) req at ffff8103f342ca00 x2996825/t0o35->a9c4c52e-e0cb-a7bc-3810-71be88e4367f at NET_0x20000c0a80b0c_UUID:0/0 lens 296/672 e 0 to 0 dl 1227939526 ref 1 fl Interpret:/0/0 rc -116/0
LustreError: 1833:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 6 previous similar messages
Lustre: MGS: haven't heard from client da42a35b-87f3-c1f5-3b49-fd01affa8c1f (at 192.168.10.1 at tcp) in 227 seconds. I think it's dead, and I am evicting it.
Lustre: cwork2-MDT0000: haven't heard from client 754d03f0-b28a-bb27-963f-a210ca74e3f4 (at 192.168.10.1 at tcp) in 227 seconds. I think it's dead, and I am evictin
g it.
LustreError: 1760:0:(handler.c:1515:mds_handle()) operation 44 on unconnected MDS from 12345-192.168.10.1 at tcp
In the node 192.168.10.1:
-------------------------
Dec 2 11:02:17 w00 kernel: [1106617.770698] LustreError: 11-0: an error occurred while communicating with 192.168.10.52 at tcp. The ldlm_enqueue operation failed with -107
Dec 2 11:02:17 w00 kernel: [1106617.770763] Lustre: cwork2-OST0002-osc-ffff8101e8646800: Connection to service cwork2-OST0002 via nid 192.168.10.52 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Dec 2 11:02:17 w00 kernel: [1106617.771128] LustreError: 167-0: This client was evicted by cwork2-OST0002; in progress operations using this service will fail.
Dec 2 11:02:17 w00 kernel: [1106617.771248] LustreError: 10050:0:(llite_mmap.c:205:ll_tree_unlock()) couldn't unlock -5
Dec 2 11:02:18 w00 kernel: [1106618.099502] LustreError: 11-0: an error occurred while communicating with 192.168.10.50 at tcp. The mds_sync operation failed with -107
Dec 2 11:02:18 w00 kernel: [1106618.099567] Lustre: cwork2-MDT0000-mdc-ffff8101e8646800: Connection to service cwork2-MDT0000 via nid 192.168.10.50 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Dec 2 11:02:18 w00 kernel: [1106618.099904] LustreError: 167-0: This client was evicted by cwork2-MDT0000; in progress operations using this service will fail.
Dec 2 11:02:18 w00 kernel: [1106618.108772] LustreError: 10050:0:(file.c:1001:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
Dec 2 11:02:18 w00 kernel: [1106618.108885] LustreError: 10050:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8102e88ff600x1563132/t0 o44->cwork2-MDT0000_UUID at 192.168.10.50@tcp:12/10 lens 296/424 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Dec 2 11:02:18 w00 kernel: [1106618.111291] LustreError: 10050:0:(file.c:97:ll_close_inode_openhandle()) inode 61234630 mdc close failed: rc = -108
Every time we encounter such messages, the client node cannot access
to the lustre filesystem any more, then a lot of processes which try
to access that filesystem get stuck, and we have to reboot the client
(sometimes even the MDT or OST nodes have to be rebooted).
Sorry that I am a newbie to the lustre filesystem. Such problem occurs
again and again, but so far I still cannot find the solution to get rid
of it, or try to recover the system without rebooting it. Hence I ask
for help here. Could someone tell me the direction to debug so that I
can figure out the problem?
The detailed configuration of our system is the following:
Head node: hostname=w00: eth0:192.168.10.1, eth1:192.168.11.1, eth2: real IP
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 17307036 10793016 5634868 66% /
tmpfs 6152284 0 6152284 0% /lib/init/rw
udev 10240 40 10200 1% /dev
tmpfs 6152284 0 6152284 0% /dev/shm
/dev/shm 6152284 0 6152284 0% /dev/shm
/dev/sda3 96094864 928384 90285132 2% /home2
wd0:/cwork2 9612360776 2494799568 6629174140 28% /mnt/src
MGS & MDT node: hostname=wd0: eth0:192.168.10.50, eth1:192.168.11.50
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 8220624 0 8220624 0% /lib/init/rw
tmpfs 8220624 0 8220624 0% /dev/shm
/dev/shm 8220624 0 8220624 0% /dev/shm
/dev/sda1 14428928 8946016 4749948 66% /cfs/src
/dev/sda5 2893628 119560 2627076 5% /cfs/mgs
/dev/sda8 247826112 647028 233016184 1% /cfs/cwork2_mdt
(The /dev/sda is a SATA II hard disk. The whole system is running in
the ramdisk. The /dev/sda1 is mounted to be a /tmp only. Hence there
is almost no extra I/O on that disk, except the I/O of MDT only)
OST node 1: hostname=wd1: eth0:192.168.10.51, eth1:192.168.11.51
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 6154300 0 6154300 0% /lib/init/rw
tmpfs 6154300 0 6154300 0% /dev/shm
/dev/shm 6154300 0 6154300 0% /dev/shm
/dev/sdd1 14428928 3909076 9786888 29% /cfs/src
/dev/sda1 1595903860 400362520 1114474140 27% /cfs/cwork2_ost1
/dev/sdb1 1595903860 420477008 1094359652 28% /cfs/cwork2_ost2
/dev/sdc1 1632841476 410218188 1139679696 27% /cfs/cwork2_ost3
(The sda, sdb, sdc are belong the same disk array. That disk array
are divided to 3 SCSI devices and are mounted as OST partitions.
Again the whole system is running in the ramdisk, and /dev/sdd
is only for /tmp and there is almost no traffic)
OST node 2: hostname=wd2: eth0:192.168.10.52, eth1:192.168.11.52
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 6154300 0 6154300 0% /lib/init/rw
tmpfs 6154300 0 6154300 0% /dev/shm
/dev/shm 6154300 0 6154300 0% /dev/shm
/dev/sdd1 14428928 3909076 9786888 29% /cfs/src
/dev/sda1 1595903860 400362520 1114474140 27% /cfs/cwork2_ost1
/dev/sdb1 1595903860 420477008 1094359652 28% /cfs/cwork2_ost2
/dev/sdc1 1632841476 410218188 1139679696 27% /cfs/cwork2_ost3
(This node is completely the same as the wd1 node)
The other 48 nodes are computing nodes, which are the blade servers.
There configurations are similar to the w00 node: They are running
on their own local hard disk, and mount wd0:/cwork2 as their common
storage space, as the lustre clients.
Please give me suggestions about how to solve this problem. If there
are some information I have missed here, please let me know. I will
post the details.
Thanks very much for your kindly help.
Best Regards,
T.H.Hsieh
More information about the lustre-discuss
mailing list