[Lustre-discuss] Nodes got evicting

Mon Dec 1 23:18:50 PST 2008

Dear All,

We have a cluster (48 nodes) with Linux kernel 2.6.22.19 + Lustre 1.6.5.1
installed. After several weeks of testing run, we often encounter the
following error messages:

In MGS & MDT node:
------------------
LustreError: 1833:0:(mds_open.c:1482:mds_close()) @@@ no handle for file close ino 60872818: cookie 0xc1cbe239fd9b0121  req at ffff8103f342ca00 x2996825/t0o35->a9c4c52e-e0cb-a7bc-3810-71be88e4367f at NET_0x20000c0a80b0c_UUID:0/0 lens 296/672 e 0 to 0 dl 1227939526 ref 1 fl Interpret:/0/0 rc 0/0
LustreError: 1833:0:(mds_open.c:1482:mds_close()) Skipped 6 previous similar messages
LustreError: 1833:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@processing error (-116)  req at ffff8103f342ca00 x2996825/t0o35->a9c4c52e-e0cb-a7bc-3810-71be88e4367f at NET_0x20000c0a80b0c_UUID:0/0 lens 296/672 e 0 to 0 dl 1227939526 ref 1 fl Interpret:/0/0 rc -116/0
LustreError: 1833:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 6 previous similar messages
Lustre: MGS: haven't heard from client da42a35b-87f3-c1f5-3b49-fd01affa8c1f (at 192.168.10.1 at tcp) in 227 seconds. I think it's dead, and I am evicting it.
Lustre: cwork2-MDT0000: haven't heard from client 754d03f0-b28a-bb27-963f-a210ca74e3f4 (at 192.168.10.1 at tcp) in 227 seconds. I think it's dead, and I am evictin
g it.
LustreError: 1760:0:(handler.c:1515:mds_handle()) operation 44 on unconnected MDS from 12345-192.168.10.1 at tcp

In the node 192.168.10.1:
-------------------------
Dec  2 11:02:17 w00 kernel: [1106617.770698] LustreError: 11-0: an error occurred while communicating with 192.168.10.52 at tcp. The ldlm_enqueue operation failed with -107
Dec  2 11:02:17 w00 kernel: [1106617.770763] Lustre: cwork2-OST0002-osc-ffff8101e8646800: Connection to service cwork2-OST0002 via nid 192.168.10.52 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Dec  2 11:02:17 w00 kernel: [1106617.771128] LustreError: 167-0: This client was evicted by cwork2-OST0002; in progress operations using this service will fail.
Dec  2 11:02:17 w00 kernel: [1106617.771248] LustreError: 10050:0:(llite_mmap.c:205:ll_tree_unlock()) couldn't unlock -5
Dec  2 11:02:18 w00 kernel: [1106618.099502] LustreError: 11-0: an error occurred while communicating with 192.168.10.50 at tcp. The mds_sync operation failed with -107
Dec  2 11:02:18 w00 kernel: [1106618.099567] Lustre: cwork2-MDT0000-mdc-ffff8101e8646800: Connection to service cwork2-MDT0000 via nid 192.168.10.50 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Dec  2 11:02:18 w00 kernel: [1106618.099904] LustreError: 167-0: This client was evicted by cwork2-MDT0000; in progress operations using this service will fail.
Dec  2 11:02:18 w00 kernel: [1106618.108772] LustreError: 10050:0:(file.c:1001:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
Dec  2 11:02:18 w00 kernel: [1106618.108885] LustreError: 10050:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff8102e88ff600x1563132/t0 o44->cwork2-MDT0000_UUID at 192.168.10.50@tcp:12/10 lens 296/424 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Dec  2 11:02:18 w00 kernel: [1106618.111291] LustreError: 10050:0:(file.c:97:ll_close_inode_openhandle()) inode 61234630 mdc close failed: rc = -108 

Every time we encounter such messages, the client node cannot access
to the lustre filesystem any more, then a lot of processes which try
to access that filesystem get stuck, and we have to reboot the client
(sometimes even the MDT or OST nodes have to be rebooted).

Sorry that I am a newbie to the lustre filesystem. Such problem occurs
again and again, but so far I still cannot find the solution to get rid
of it, or try to recover the system without rebooting it. Hence I ask
for help here. Could someone tell me the direction to debug so that I
can figure out the problem?

The detailed configuration of our system is the following:

Head node: hostname=w00: eth0:192.168.10.1, eth1:192.168.11.1, eth2: real IP
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /dev/sda1             17307036  10793016   5634868  66% /
    tmpfs                  6152284         0   6152284   0% /lib/init/rw
    udev                     10240        40     10200   1% /dev
    tmpfs                  6152284         0   6152284   0% /dev/shm
    /dev/shm               6152284         0   6152284   0% /dev/shm
    /dev/sda3             96094864    928384  90285132   2% /home2
    wd0:/cwork2          9612360776 2494799568 6629174140  28% /mnt/src

MGS & MDT node: hostname=wd0: eth0:192.168.10.50, eth1:192.168.11.50
    Filesystem           1K-blocks      Used Available Use% Mounted on
    tmpfs                  8220624         0   8220624   0% /lib/init/rw
    tmpfs                  8220624         0   8220624   0% /dev/shm
    /dev/shm               8220624         0   8220624   0% /dev/shm
    /dev/sda1             14428928   8946016   4749948  66% /cfs/src
    /dev/sda5              2893628    119560   2627076   5% /cfs/mgs
    /dev/sda8            247826112    647028 233016184   1% /cfs/cwork2_mdt

    (The /dev/sda is a SATA II hard disk. The whole system is running in
     the ramdisk. The /dev/sda1 is mounted to be a /tmp only. Hence there
     is almost no extra I/O on that disk, except the I/O of MDT only)

OST node 1: hostname=wd1: eth0:192.168.10.51, eth1:192.168.11.51
    Filesystem           1K-blocks      Used Available Use% Mounted on
    tmpfs                  6154300         0   6154300   0% /lib/init/rw
    tmpfs                  6154300         0   6154300   0% /dev/shm
    /dev/shm               6154300         0   6154300   0% /dev/shm
    /dev/sdd1             14428928   3909076   9786888  29% /cfs/src
    /dev/sda1            1595903860 400362520 1114474140  27% /cfs/cwork2_ost1
    /dev/sdb1            1595903860 420477008 1094359652  28% /cfs/cwork2_ost2
    /dev/sdc1            1632841476 410218188 1139679696  27% /cfs/cwork2_ost3

    (The sda, sdb, sdc are belong the same disk array. That disk array
     are divided to 3 SCSI devices and are mounted as OST partitions.
     Again the whole system is running in the ramdisk, and /dev/sdd
     is only for /tmp and there is almost no traffic)

OST node 2: hostname=wd2: eth0:192.168.10.52, eth1:192.168.11.52
    Filesystem           1K-blocks      Used Available Use% Mounted on
    tmpfs                  6154300         0   6154300   0% /lib/init/rw
    tmpfs                  6154300         0   6154300   0% /dev/shm
    /dev/shm               6154300         0   6154300   0% /dev/shm
    /dev/sdd1             14428928   3909076   9786888  29% /cfs/src
    /dev/sda1            1595903860 400362520 1114474140  27% /cfs/cwork2_ost1
    /dev/sdb1            1595903860 420477008 1094359652  28% /cfs/cwork2_ost2
    /dev/sdc1            1632841476 410218188 1139679696  27% /cfs/cwork2_ost3

    (This node is completely the same as the wd1 node)

The other 48 nodes are computing nodes, which are the blade servers.
There configurations are similar to the w00 node: They are running
on their own local hard disk, and mount wd0:/cwork2 as their common
storage space, as the lustre clients.

Please give me suggestions about how to solve this problem. If there
are some information I have missed here, please let me know. I will
post the details.

Thanks very much for your kindly help.

Best Regards,

T.H.Hsieh