[Lustre-discuss] Lustre 1.6.5 MDS problem?

Johnlya johnlya at gmail.com
Thu Jul 3 01:38:18 PDT 2008


Lustre Client don't access Lustre File System when network IO become
larger.  Thanks a lot.
Bellow is Lustre configure:
MDS_MASTER:mkfs.lustre --fsname=lenovo --mdt --mgs --reformat --
failnode=MDS_SLAVER /dev/sdd1

OSS1_MASTER:mkfs.lustre --fsname=lenovo --ost --reformat --
mgsnode=MDS_MASTER at tcp0 --mgsnode=MDS_SLAVER at tcp0 --
failnode=OSS1_SLAVER /dev/sdd1

CLIENT:mount -t lustre MDS_MASTER at tcp0:MDS_SLAVER at tcp0:/lenovo /mnt/
webfile/

[root at MDS_MASTER tmp]# uname -a
Linux MDS_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12
22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Bellow is /var/log/messages of MDS_MASTER:
Jun 29 19:58:12 MDS_MASTER syslogd 1.4.1: restart.
Jun 29 21:11:36 MDS_MASTER sshd(pam_unix)[538]: session opened for
user root by root(uid=0)
Jun 30 10:19:07 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 10:19:07 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 10:19:07 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 10:19:07 MDS_MASTER harc[1192]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 10:19:08 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 10:19:08 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 10:19:10 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 10:19:10 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 10:19:11 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 10:19:12 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 10:19:31 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 10:19:31 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 34670 ms
Jun 30 10:19:31 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 10:19:31 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 10:19:31 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 10:19:31 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 10:19:33 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 10:19:36 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 10:19:36 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:13:48 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 15:13:48 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 15:13:48 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 15:13:49 MDS_MASTER harc[1308]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 15:13:49 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 15:13:49 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 15:13:50 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 15:13:50 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:13:54 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:13:54 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:15:46 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 15:15:46 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 127850 ms
Jun 30 15:15:46 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 15:15:46 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 15:15:46 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 15:15:46 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 15:15:47 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:15:49 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:15:49 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:31:30 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 15:31:30 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 15:31:30 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 15:31:30 MDS_MASTER harc[1330]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 15:31:31 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 15:31:31 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 15:31:32 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 15:31:32 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:31:35 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:31:35 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:41:43 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 15:41:43 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 623250 ms
Jun 30 15:41:43 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 15:41:43 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 15:41:43 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 15:41:43 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 15:41:44 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:41:45 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:41:46 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:44:52 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 15:44:52 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 15:44:52 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 15:44:52 MDS_MASTER harc[1354]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 15:44:53 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 15:44:53 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 15:44:53 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 15:44:53 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:44:57 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:44:57 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:45:16 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 15:45:16 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 34170 ms
Jun 30 15:45:16 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 15:45:16 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 15:45:16 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 15:45:16 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 15:45:17 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:45:18 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:45:19 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:52:20 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 15:52:20 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 15:52:20 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 15:52:20 MDS_MASTER harc[1370]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 15:52:22 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 15:52:22 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 15:52:23 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 15:52:23 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:52:24 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:52:25 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:52:43 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 15:52:43 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 33140 ms
Jun 30 15:52:43 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 15:52:43 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 15:52:43 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 15:52:43 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 15:52:44 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:52:46 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:52:47 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:53:41 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 15:53:41 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 15:53:41 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 15:53:41 MDS_MASTER harc[1386]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 15:53:41 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 15:53:41 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 15:53:43 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 15:53:43 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:53:45 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:53:45 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 15:56:07 MDS_MASTER kernel: Lustre: MGS: haven't heard from
client 1e4742fc-aa42-7c9c-f033-90d6c7d98008 (at 192.168.1.151 at tcp) in
238 seconds. I think it's dead, and I am evicting it.
Jun 30 15:56:11 MDS_MASTER kernel: Lustre: lenovo-MDT0000: haven't
heard from client a6e58dca-cdd0-fe97-5442-b620254deeef (at
192.168.1.151 at tcp) in 228 seconds. I think it's dead, and I am
evicting it.
Jun 30 15:56:43 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 15:56:43 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 192560 ms
Jun 30 15:56:43 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 15:56:43 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 15:56:43 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 15:56:43 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 15:56:44 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 15:56:47 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 15:56:48 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 16:10:34 MDS_MASTER kernel: Lustre: lenovo-MDT0000: haven't
heard from client 025b84b9-4686-082a-5847-0b63c12f9441 (at
192.168.1.151 at tcp) in 229 seconds. I think it's dead, and I am
evicting it.
Jun 30 16:11:34 MDS_MASTER kernel: Lustre: MGS: haven't heard from
client 94f8232a-c631-bd83-5c86-853ad9a7a357 (at 192.168.1.151 at tcp) in
227 seconds. I think it's dead, and I am evicting it.
Jun 30 16:14:33 MDS_MASTER kernel: Lustre: lenovo-MDT0000: haven't
heard from client 9732113f-4d7f-692b-0109-bc55f71d79b7 (at
192.168.1.151 at tcp) in 227 seconds. I think it's dead, and I am
evicting it.
Jun 30 16:14:42 MDS_MASTER kernel: Lustre: MGS: haven't heard from
client 3a87eb52-5afd-99a1-62cf-bc32aed57e58 (at 192.168.1.151 at tcp) in
238 seconds. I think it's dead, and I am evicting it.
Jun 30 16:24:26 MDS_MASTER kernel: Lustre: lenovo-MDT0000: haven't
heard from client e9841e9b-2e06-7b22-8e04-13f476f56cfd (at
192.168.1.151 at tcp) in 227 seconds. I think it's dead, and I am
evicting it.
Jun 30 16:26:54 MDS_MASTER kernel: Lustre: Request x26395 sent from
lenovo-MDT0000 to NID 192.168.1.152 at tcp 6s ago has timed out (limit
6s).
Jun 30 16:26:54 MDS_MASTER kernel: Lustre: Skipped 3 previous similar
messages
Jun 30 16:26:54 MDS_MASTER kernel: LustreError: 138-a: lenovo-MDT0000:
A client on nid 192.168.1.152 at tcp was evicted due to a lock blocking
callback to 192.168.1.152 at tcp timed out: rc -107
Jun 30 16:29:35 MDS_MASTER kernel: Lustre: MGS: haven't heard from
client a68eeba8-7da3-59a8-0441-ed90ca90c108 (at 192.168.1.152 at tcp) in
231 seconds. I think it's dead, and I am evicting it.
Jun 30 16:29:35 MDS_MASTER kernel: Lustre: Skipped 1 previous similar
message
Jun 30 17:14:35 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 17:14:35 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 17:14:35 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 17:14:35 MDS_MASTER harc[1434]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 17:14:35 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 17:14:35 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 17:14:37 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 17:14:37 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 17:14:40 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 17:14:40 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 17:16:46 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 17:16:46 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 141860 ms
Jun 30 17:16:46 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 17:16:46 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 17:16:46 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 17:16:46 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 17:16:47 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 17:16:48 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 17:16:49 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 17:51:55 MDS_MASTER heartbeat: [3647]: WARN: node
192.168.1.200: is dead
Jun 30 17:51:55 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status dead
Jun 30 17:51:55 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 dead.
Jun 30 17:51:55 MDS_MASTER harc[1461]: info: Running /etc/ha.d/rc.d/
status status
Jun 30 17:51:56 MDS_MASTER ipfail: [4528]: info: NS: We are dead. :<
Jun 30 17:51:56 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status dead
Jun 30 17:51:57 MDS_MASTER ipfail: [4528]: info: We are dead. :<
Jun 30 17:51:57 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 17:52:00 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 17:52:00 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 17:52:21 MDS_MASTER heartbeat: [3647]: info: Link
192.168.1.200:192.168.1.200 up.
Jun 30 17:52:21 MDS_MASTER heartbeat: [3647]: WARN: Late heartbeat:
Node 192.168.1.200: interval 36210 ms
Jun 30 17:52:21 MDS_MASTER ipfail: [4528]: info: Link Status update:
Link 192.168.1.200/192.168.1.200 now has status up
Jun 30 17:52:21 MDS_MASTER heartbeat: [3647]: info: Status update for
node 192.168.1.200: status ping
Jun 30 17:52:21 MDS_MASTER ipfail: [4528]: info: Status update: Node
192.168.1.200 now has status ping
Jun 30 17:52:21 MDS_MASTER ipfail: [4528]: info: A ping node just came
up.
Jun 30 17:52:23 MDS_MASTER ipfail: [4528]: info: Asking other side for
ping node count.
Jun 30 17:52:24 MDS_MASTER ipfail: [4528]: info: Ping node count is
balanced.
Jun 30 17:52:24 MDS_MASTER ipfail: [4528]: info: No giveup timer to
abort.
Jun 30 18:13:38 MDS_MASTER kernel: usb 2-2: new low speed USB device
using address 3
Jun 30 18:13:38 MDS_MASTER kernel: input: USB HID v1.10 Keyboard
[Logitech Logitech USB Keyboard] on usb-0000:00:1d.0-2
Jun 30 18:14:19 MDS_MASTER kernel: usb 2-2: USB disconnect, address 3
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: Daily
informational memory statistics
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats:
10/196608 ms age 0 [pid3647/MST_CONTROL]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
773/5943764  137452/72584 [pid3647/MST_CONTROL]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
181560 total malloc bytes. pid [3647/MST_CONTROL]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/2 ms
age 86349340 [pid3686/HBFIFO]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
402/485  55480/30451 [pid3686/HBFIFO]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
57868 total malloc bytes. pid [3686/HBFIFO]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43035821180 [pid3696/HBWRITE]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
406/54966  57064/31827 [pid3696/HBWRITE]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
65872 total malloc bytes. pid [3696/HBWRITE]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43035821200 [pid3697/HBREAD]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
407/207004  57156/31891 [pid3697/HBREAD]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
58384 total malloc bytes. pid [3697/HBREAD]
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/94639
ms age 110 [pid3706/HBWRITE]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
408/2524594  57248/31955 [pid3706/HBWRITE]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
70984 total malloc bytes. pid [3706/HBWRITE]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/41270
ms age 190 [pid3707/HBREAD]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
409/867130  57340/32019 [pid3707/HBREAD]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
59692 total malloc bytes. pid [3707/HBREAD]
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jun 30 18:51:02 MDS_MASTER heartbeat: [3647]: info: These are nothing
to worry about.
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: Daily
informational memory statistics
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats:
9/394357 ms age 0 [pid3647/MST_CONTROL]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
743/11909059  134596/71278 [pid3647/MST_CONTROL]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
189780 total malloc bytes. pid [3647/MST_CONTROL]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/2 ms
age 172749350 [pid3686/HBFIFO]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
402/485  55480/30451 [pid3686/HBFIFO]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
57868 total malloc bytes. pid [3686/HBFIFO]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43122221190 [pid3696/HBWRITE]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
406/109420  57064/31827 [pid3696/HBWRITE]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
65872 total malloc bytes. pid [3696/HBWRITE]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43122221210 [pid3697/HBREAD]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
407/413350  57156/31891 [pid3697/HBREAD]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
58384 total malloc bytes. pid [3697/HBREAD]
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats:
0/189215 ms age 110 [pid3706/HBWRITE]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
408/5046621  57248/31955 [pid3706/HBWRITE]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
70984 total malloc bytes. pid [3706/HBWRITE]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/84259
ms age 190 [pid3707/HBREAD]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
409/1769899  57340/32019 [pid3707/HBREAD]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
59692 total malloc bytes. pid [3707/HBREAD]
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  1 18:51:02 MDS_MASTER heartbeat: [3647]: info: These are nothing
to worry about.
Jul  2 14:43:13 MDS_MASTER sshd(pam_unix)[3537]: session opened for
user root by root(uid=0)
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: Daily
informational memory statistics
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats:
8/592106 ms age 0 [pid3647/MST_CONTROL]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
713/17874352  131740/69962 [pid3647/MST_CONTROL]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
189780 total malloc bytes. pid [3647/MST_CONTROL]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/2 ms
age 259149360 [pid3686/HBFIFO]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
402/485  55480/30451 [pid3686/HBFIFO]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
57868 total malloc bytes. pid [3686/HBFIFO]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43208621200 [pid3696/HBWRITE]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
406/163871  57064/31827 [pid3696/HBWRITE]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
65872 total malloc bytes. pid [3696/HBWRITE]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: MSG stats: 0/0 ms
age 43208621220 [pid3697/HBREAD]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
407/619696  57156/31891 [pid3697/HBREAD]
Jul  2 18:51:01 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
58384 total malloc bytes. pid [3697/HBREAD]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: MSG stats:
0/283791 ms age 130 [pid3706/HBWRITE]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
408/7568647  57248/31955 [pid3706/HBWRITE]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
70984 total malloc bytes. pid [3706/HBWRITE]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: MSG stats:
0/127248 ms age 230 [pid3707/HBREAD]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: ha_malloc stats:
409/2672668  57340/32019 [pid3707/HBREAD]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: RealMalloc stats:
59692 total malloc bytes. pid [3707/HBREAD]
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: Current arena
value: 0
Jul  2 18:51:02 MDS_MASTER heartbeat: [3647]: info: These are nothing
to worry about.
Jul  3 13:10:12 MDS_MASTER kernel: Lustre: Request x2621300 sent from
lenovo-MDT0000 to NID 192.168.1.101 at tcp 6s ago has timed out (limit
6s).
Jul  3 13:10:12 MDS_MASTER kernel: LustreError: 138-a: lenovo-MDT0000:
A client on nid 192.168.1.101 at tcp was evicted due to a lock blocking
callback to 192.168.1.101 at tcp timed out: rc -107
Jul  3 13:10:46 MDS_MASTER kernel: LustreError: 5058:0:(ldlm_lockd.c:
1699:ldlm_cancel_handler()) operation 103 from 12345-192.168.1.101 at tcp
with bad export cookie 15190038513323647109
Jul  3 13:12:12 MDS_MASTER kernel: Lustre: 5101:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 reconnecting
Jul  3 13:12:12 MDS_MASTER kernel: Lustre: 5101:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
4bb3de9c-b9b2-e919-6eb4-ea264525d5b7 at 192.168.1.102@tcp to
0x000001011baa8000; still busy with 2 active RPCs
Jul  3 13:12:12 MDS_MASTER kernel: LustreError: 5101:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 0000010138479600 x10341734/t0 o38->4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 at NET_0x20000c0a80166_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062032 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:12:14 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 059b8837-8efc-
ace1-9a33-2af95b367a6a reconnecting
Jul  3 13:12:14 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
059b8837-8efc-ace1-9a33-2af95b367a6a at 192.168.1.152@tcp to
0x000001011facf000; still busy with 2 active RPCs
Jul  3 13:12:14 MDS_MASTER kernel: LustreError: 5097:0:(ldlm_request.c:
69:ldlm_expired_completion_wait()) ### lock timed out (enqueued at
1215061834, 100s ago); not entering recovery in server code, just
going back to sleep ns: mds-lenovo-MDT0000_UUID lock:
000001010a109280/0xd2cddb8ca88a7740 lrc: 3/0,1 mode: --/EX res:
237470160/1835281226 bits 0x2 rrc: 7 type: IBT flags: 4004000 remote:
0x0 expref: -99 pid 5097
Jul  3 13:12:14 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215061934.5097
Jul  3 13:12:23 MDS_MASTER kernel: Lustre: 5087:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 2466a067-c120-a833-a490-
e3347ac3fe82 reconnecting
Jul  3 13:12:23 MDS_MASTER kernel: Lustre: 5087:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 1 previous similar message
Jul  3 13:12:23 MDS_MASTER kernel: Lustre: 5087:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
2466a067-c120-a833-a490-e3347ac3fe82 at 192.168.1.251@tcp to
0x000001011f7b4000; still busy with 2 active RPCs
Jul  3 13:12:23 MDS_MASTER kernel: Lustre: 5087:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 1 previous similar message
Jul  3 13:12:23 MDS_MASTER kernel: LustreError: 5087:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 00000100bfe62600 x1098288/t0 o38->2466a067-c120-a833-a490-
e3347ac3fe82 at NET_0x20000c0a801fb_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062043 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:12:23 MDS_MASTER kernel: LustreError: 5087:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 2 previous similar messages
Jul  3 13:13:38 MDS_MASTER kernel: Lustre: MGS: haven't heard from
client 796de287-f07d-0727-1c57-c7246bafc6ac (at 192.168.1.101 at tcp) in
240 seconds. I think it's dead, and I am evicting it.
Jul  3 13:13:52 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Watchdog triggered for pid 5088: it was inactive for
200s
Jul  3 13:13:52 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) showing stack for process 5088
Jul  3 13:13:52 MDS_MASTER kernel: ll_mdt_02     S
000001010f6e7018     0  5088      1          5089  5087 (L-TLB)
Jul  3 13:13:52 MDS_MASTER kernel: 000001010f6e6f38 0000000000000046
0000000000000000 0000010130b99500
Jul  3 13:13:52 MDS_MASTER kernel:        0000010130d325c0
0000000000000018 0000010130d325d0 00000004a03b78f9
Jul  3 13:13:52 MDS_MASTER kernel:        0000010037e9f030
0000000000000cf7
Jul  3 13:13:52 MDS_MASTER kernel: Call
Trace:<ffffffffa03b7bd8>{:osc:oscc_has_objects+56}
<ffffffffa03b92b5>{:osc:osc_create+4613}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa043bb6d>{:lov:qos_prep_create+6861}
<ffffffff80133804>{default_wake_function+0}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa04309ef>{:lov:lov_prep_create_set+575}
<ffffffffa030393f>{:ptlrpc:ptlrpc_set_destroy+799}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa03047a0>{:ptlrpc:ptlrpc_expired_set+0}
<ffffffffa0302530>{:ptlrpc:ptlrpc_interrupted_set+0}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa04199fa>{:lov:lov_create+7898} <ffffffff8017d14c>{__getblk
+42}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa04d45f1>{:ldiskfs:ldiskfs_xattr_ibody_get+417}
Jul  3 13:13:52 MDS_MASTER kernel:        <ffffffff801ec0a5>{__up_read
+16} <ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa054e12d>{:mds:mds_get_md+141}
<ffffffffa0576d48>{:mds:mds_create_objects+4584}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa0578f04>{:mds:mds_finish_open+900}
<ffffffffa057c1d2>{:mds:mds_open+9682}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffff80132180>{try_to_wake_up+876}
<ffffffffa038bfb4>{:ksocklnd:ksocknal_queue_tx_locked+1236}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa056d3fa>{:mds:mds_reint_rec+458}
<ffffffffa05804ee>{:mds:mds_open_unpack+766}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:13:52 MDS_MASTER kernel:
<ffffffffa058080b>{:mds:mds_update_unpack+507}
<ffffffffa055134c>{:mds:mds_reint+844}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa055b504>{:mds:fixup_handle_for_resent_req+84}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa0315840>{:ptlrpc:lustre_swab_ldlm_intent+0}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa055bb8b>{:mds:mds_intent_policy+1179}
<ffffffffa0221a02>{:lnet:LNetMDBind+690}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffff80131bc7>{recalc_task_prio+337}
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:53 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:53 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:13:53 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:13:53 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215062032.5088
Jul  3 13:13:54 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Watchdog triggered for pid 5116: it was inactive for
200s
Jul  3 13:13:54 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) showing stack for process 5116
Jul  3 13:13:54 MDS_MASTER kernel: ll_mdt_30     S
00000101326d5018     0  5116      1          5117  5115 (L-TLB)
Jul  3 13:13:54 MDS_MASTER kernel: 00000101326d4f38 0000000000000046
0000000000000000 00000101315bde80
Jul  3 13:13:54 MDS_MASTER kernel:        0000010130d325c0
ffffffffa043c022 0000010130d325d0 00000003a03b78f9
Jul  3 13:13:54 MDS_MASTER kernel:        000001010f8d0800
0000000000001e31
Jul  3 13:13:54 MDS_MASTER kernel: Call
Trace:<ffffffffa043c022>{:lov:lsm_alloc_plain+546}
<ffffffffa03b7bd8>{:osc:oscc_has_objects+56}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa03b92b5>{:osc:osc_create+4613}
<ffffffffa043bb6d>{:lov:qos_prep_create+6861}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffff80133804>{default_wake_function+0}
<ffffffffa04309ef>{:lov:lov_prep_create_set+575}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa030393f>{:ptlrpc:ptlrpc_set_destroy+799}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
<ffffffffa04199fa>{:lov:lov_create+7898}
Jul  3 13:13:54 MDS_MASTER kernel:        <ffffffff8017d14c>{__getblk
+42} <ffffffffa04d45f1>{:ldiskfs:ldiskfs_xattr_ibody_get
+417}ll_mdt_11     S 0000000000000000     0  5097      1
5098  5096 (L-TLB)
Jul  3 13:13:54 MDS_MASTER kernel: 000001010f4832a8 0000000000000046
0000000000000246 ffffffff8013f734
Jul  3 13:13:54 MDS_MASTER kernel:        000001013a733dc0
0000000101eff122 ffffffff00000000 000000063b16d030
Jul  3 13:13:54 MDS_MASTER kernel:        000001013b16d030
00000000000001e1
Jul  3 13:13:54 MDS_MASTER kernel: Call Trace:
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa02cf0bc>{:ptlrpc:lock_res_and_lock+188}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02e81fd>{:ptlrpc:ldlm_completion_ast+1117}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffff80133804>{default_wake_function+0}
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02e8a98>{:ptlrpc:ldlm_cli_enqueue_local+1272}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa0052b60>{:jbd:journal_stop+592}
<ffffffffa054d1a4>{:mds:mds_fid2locked_dentry+420}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02ebba0>{:ptlrpc:ldlm_blocking_ast+0}
<ffffffffa02e7da0>{:ptlrpc:ldlm_completion_ast+0}
Jul  3 13:13:54 MDS_MASTER kernel:        <ffffffff801ec0a5>{__up_read
+16} <ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa054e12d>{:mds:mds_get_md+141}
<ffffffffa0576d48>{:mds:mds_create_objects+4584}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffff80131c55>{activate_task+124} <ffffffffa057a50e>{:mds:mds_open
+2318}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa0578f04>{:mds:mds_finish_open+900}
<ffffffffa057c1d2>{:mds:mds_open+9682}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffff80133855>{__wake_up_common+67}
<ffffffff80133855>{__wake_up_common+67} <ffffffff801338ab>{__wake_up
+54}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa038c2e1>{:ksocklnd:ksocknal_launch_packet+465}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa056d3fa>{:mds:mds_reint_rec+458}
<ffffffffa05804ee>{:mds:mds_open_unpack+766}
Jul  3 13:13:54 MDS_MASTER kernel:        <ffffffff801338ab>{__wake_up
+54}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa038c2e1>{:ksocklnd:ksocknal_launch_packet+465}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa056d3fa>{:mds:mds_reint_rec+458}
<ffffffffa05804ee>{:mds:mds_open_unpack+766}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa058080b>{:mds:mds_update_unpack+507}
<ffffffffa055134c>{:mds:mds_reint+844}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa055b504>{:mds:fixup_handle_for_resent_req+84}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa0315840>{:ptlrpc:lustre_swab_ldlm_intent+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa055bb8b>{:mds:mds_intent_policy+1179}
<ffffffffa027811e>{:obdclass:class_handle_unhash_nolock+270}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:13:54 MDS_MASTER kernel:
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa058080b>{:mds:mds_update_unpack+507}
<ffffffffa055134c>{:mds:mds_reint+844}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa055b504>{:mds:fixup_handle_for_resent_req+84}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa0315840>{:ptlrpc:lustre_swab_ldlm_intent+0}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa055bb8b>{:mds:mds_intent_policy+1179}
<ffffffffa027811e>{:obdclass:class_handle_unhash_nolock+270}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffff8030e8d9>{__lock_text_start+47}
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffff801612ed>{cache_flusharray+107}
<ffffffff80131bc7>{recalc_task_prio+337}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:55 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:13:55 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:13:55 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215062034.5116
Jul  3 13:13:55 MDS_MASTER kernel: <ffffffff801612ed>{cache_flusharray
+107} <ffffffff80131bc7>{recalc_task_prio+337}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:55 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:13:55 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:13:55 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:13:55 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215062034.5097
Jul  3 13:14:03 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Watchdog triggered for pid 5095: it was inactive for
200s
Jul  3 13:14:03 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Skipped 1 previous similar message
Jul  3 13:14:03 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) showing stack for process 5095
Jul  3 13:14:03 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) Skipped 1 previous similar message
Jul  3 13:14:03 MDS_MASTER kernel: ll_mdt_09     S
000001010f759018     0  5095      1          5096  5094 (L-TLB)
Jul  3 13:14:03 MDS_MASTER kernel: 000001010f758f38 0000000000000046
0000000000000000 000001010f6cd300
Jul  3 13:14:03 MDS_MASTER kernel:        0000010130d325c0
0000000000000018 0000010130d325d0 00000002a03b78f9
Jul  3 13:14:03 MDS_MASTER kernel:        000001013b266030
0000000000000afc
Jul  3 13:14:03 MDS_MASTER kernel: Call
Trace:<ffffffff80160a2a>{cache_alloc_refill+393}
<ffffffffa03b7bd8>{:osc:oscc_has_objects+56}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa03b92b5>{:osc:osc_create+4613}
<ffffffffa043bb6d>{:lov:qos_prep_create+6861}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffff80133804>{default_wake_function+0}
<ffffffffa04309ef>{:lov:lov_prep_create_set+575}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa030393f>{:ptlrpc:ptlrpc_set_destroy+799}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa03047a0>{:ptlrpc:ptlrpc_expired_set+0}
<ffffffffa0302530>{:ptlrpc:ptlrpc_interrupted_set+0}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa04199fa>{:lov:lov_create+7898} <ffffffff8017d14c>{__getblk
+42}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa04d45f1>{:ldiskfs:ldiskfs_xattr_ibody_get+417}
Jul  3 13:14:03 MDS_MASTER kernel:        <ffffffff801ec0a5>{__up_read
+16} <ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa054e12d>{:mds:mds_get_md+141}
<ffffffff80160a2a>{cache_alloc_refill+393}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa0576d48>{:mds:mds_create_objects+4584}
<ffffffffa04d597b>{:ldiskfs:ldiskfs_xattr_get+139}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa0578f04>{:mds:mds_finish_open+900}
<ffffffffa057c1d2>{:mds:mds_open+9682}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa038c2e1>{:ksocklnd:ksocknal_launch_packet+465}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa056d3fa>{:mds:mds_reint_rec+458}
<ffffffffa05804ee>{:mds:mds_open_unpack+766}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa058080b>{:mds:mds_update_unpack+507}
<ffffffffa055134c>{:mds:mds_reint+844}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa055b504>{:mds:fixup_handle_for_resent_req+84}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa0315840>{:ptlrpc:lustre_swab_ldlm_intent+0}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa055bb8b>{:mds:mds_intent_policy+1179}
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffff801612ed>{cache_flusharray+107}
<ffffffff80131bc7>{recalc_task_prio+337}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:14:03 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:14:03 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:14:03 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:14:03 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215062043.5095
Jul  3 13:14:17 MDS_MASTER kernel: Lustre: 5106:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 reconnecting
Jul  3 13:14:17 MDS_MASTER kernel: Lustre: 5106:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
4bb3de9c-b9b2-e919-6eb4-ea264525d5b7 at 192.168.1.102@tcp to
0x000001011baa8000; still busy with 2 active RPCs
Jul  3 13:14:17 MDS_MASTER kernel: LustreError: 5106:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 0000010137efc800 x10341757/t0 o38->4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 at NET_0x20000c0a80166_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062157 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:14:28 MDS_MASTER kernel: Lustre: 5115:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 2466a067-c120-a833-a490-
e3347ac3fe82 reconnecting
Jul  3 13:14:28 MDS_MASTER kernel: Lustre: 5115:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 2 previous similar messages
Jul  3 13:14:28 MDS_MASTER kernel: Lustre: 5115:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
2466a067-c120-a833-a490-e3347ac3fe82 at 192.168.1.251@tcp to
0x000001011f7b4000; still busy with 2 active RPCs
Jul  3 13:14:28 MDS_MASTER kernel: Lustre: 5115:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 2 previous similar messages
Jul  3 13:14:28 MDS_MASTER kernel: LustreError: 5115:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 0000010137efc400 x1098311/t0 o38->2466a067-c120-a833-a490-
e3347ac3fe82 at NET_0x20000c0a801fb_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062168 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:14:28 MDS_MASTER kernel: LustreError: 5115:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 2 previous similar messages
Jul  3 13:16:22 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 reconnecting
Jul  3 13:16:22 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
4bb3de9c-b9b2-e919-6eb4-ea264525d5b7 at 192.168.1.102@tcp to
0x000001011baa8000; still busy with 2 active RPCs
Jul  3 13:16:22 MDS_MASTER kernel: LustreError: 5094:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 000001010f781c50 x10341784/t0 o38->4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 at NET_0x20000c0a80166_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062282 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:16:33 MDS_MASTER kernel: Lustre: 5114:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 2466a067-c120-a833-a490-
e3347ac3fe82 reconnecting
Jul  3 13:16:33 MDS_MASTER kernel: Lustre: 5114:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 2 previous similar messages
Jul  3 13:16:33 MDS_MASTER kernel: Lustre: 5114:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
2466a067-c120-a833-a490-e3347ac3fe82 at 192.168.1.251@tcp to
0x000001011f7b4000; still busy with 2 active RPCs
Jul  3 13:16:33 MDS_MASTER kernel: Lustre: 5114:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 2 previous similar messages
Jul  3 13:18:29 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000:
9266e751-95bd-3869-90a8-67ec125c66b1 reconnecting
Jul  3 13:18:29 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
9266e751-95bd-3869-90a8-67ec125c66b1 at 192.168.1.151@tcp to
0x000001011f219000; still busy with 2 active RPCs
Jul  3 13:18:29 MDS_MASTER kernel: LustreError: 5111:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 0000010005ffd800 x95159120/t0 o38-
>9266e751-95bd-3869-90a8-67ec125c66b1 at NET_0x20000c0a80197_UUID:0/0
lens 304/200 e 0 to 0 dl 1215062409 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:18:29 MDS_MASTER kernel: LustreError: 5111:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 3 previous similar messages
Jul  3 13:20:34 MDS_MASTER kernel: Lustre: 5100:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000:
9266e751-95bd-3869-90a8-67ec125c66b1 reconnecting
Jul  3 13:20:34 MDS_MASTER kernel: Lustre: 5100:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 3 previous similar messages
Jul  3 13:20:34 MDS_MASTER kernel: Lustre: 5100:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
9266e751-95bd-3869-90a8-67ec125c66b1 at 192.168.1.151@tcp to
0x000001011f219000; still busy with 2 active RPCs
Jul  3 13:20:34 MDS_MASTER kernel: Lustre: 5100:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 3 previous similar messages
Jul  3 13:20:34 MDS_MASTER kernel: LustreError: 5100:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 00000100bfe53200 x95161907/t0 o38-
>9266e751-95bd-3869-90a8-67ec125c66b1 at NET_0x20000c0a80197_UUID:0/0
lens 304/200 e 0 to 0 dl 1215062534 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:20:34 MDS_MASTER kernel: LustreError: 5100:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 3 previous similar messages
Jul  3 13:22:39 MDS_MASTER kernel: Lustre: 5112:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000:
9266e751-95bd-3869-90a8-67ec125c66b1 reconnecting
Jul  3 13:22:39 MDS_MASTER kernel: Lustre: 5112:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 3 previous similar messages
Jul  3 13:22:39 MDS_MASTER kernel: Lustre: 5112:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
9266e751-95bd-3869-90a8-67ec125c66b1 at 192.168.1.151@tcp to
0x000001011f219000; still busy with 2 active RPCs
Jul  3 13:22:39 MDS_MASTER kernel: Lustre: 5112:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 3 previous similar messages
Jul  3 13:22:48 MDS_MASTER kernel: LustreError: 5099:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 000001013b256a00 x1098419/t0 o38->2466a067-c120-a833-a490-
e3347ac3fe82 at NET_0x20000c0a801fb_UUID:0/0 lens 304/200 e 0 to 0 dl
1215062668 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:22:48 MDS_MASTER kernel: LustreError: 5099:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 4 previous similar messages
Jul  3 13:24:43 MDS_MASTER kernel: Lustre: 5101:0:(ldlm_lib.c:
711:target_handle_connect()) lenovo-MDT0000: exp 0000010090f04000
already connecting
Jul  3 13:24:53 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 2466a067-c120-a833-a490-
e3347ac3fe82 reconnecting
Jul  3 13:24:53 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 6 previous similar messages
Jul  3 13:24:53 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
2466a067-c120-a833-a490-e3347ac3fe82 at 192.168.1.251@tcp to
0x000001011f7b4000; still busy with 2 active RPCs
Jul  3 13:24:53 MDS_MASTER kernel: Lustre: 5098:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 4 previous similar messages
Jul  3 13:27:14 MDS_MASTER kernel: LustreError: 5109:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 0000010005ff8600 x95161993/t0 o38-
>9266e751-95bd-3869-90a8-67ec125c66b1 at NET_0x20000c0a80197_UUID:0/0
lens 304/200 e 0 to 0 dl 1215062934 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:27:14 MDS_MASTER kernel: LustreError: 5109:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 9 previous similar messages
Jul  3 13:27:34 MDS_MASTER sshd(pam_unix)[6060]: session opened for
user root by root(uid=0)
Jul  3 13:29:19 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000:
9266e751-95bd-3869-90a8-67ec125c66b1 reconnecting
Jul  3 13:29:19 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 9 previous similar messages
Jul  3 13:29:19 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
9266e751-95bd-3869-90a8-67ec125c66b1 at 192.168.1.151@tcp to
0x000001011f219000; still busy with 2 active RPCs
Jul  3 13:29:19 MDS_MASTER kernel: Lustre: 5094:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 7 previous similar messages
Jul  3 13:32:04 MDS_MASTER sshd(pam_unix)[6097]: session opened for
user root by root(uid=0)
Jul  3 13:35:59 MDS_MASTER kernel: LustreError: 5089:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 00000101329f5200 x85735404/t0 o38->059b8837-8efc-
ace1-9a33-2af95b367a6a at NET_0x20000c0a80198_UUID:0/0 lens 304/200 e 0
to 0 dl 1215063459 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:35:59 MDS_MASTER kernel: LustreError: 5089:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 15 previous similar messages
Jul  3 13:37:06 MDS_MASTER sshd(pam_unix)[6136]: session opened for
user root by root(uid=0)
Jul  3 13:38:01 MDS_MASTER kernel: LustreError: 5376:0:(mds_open.c:
1482:mds_close()) @@@ no handle for file close ino 233769516: cookie
0xd2cddb8ca87c930f  req at 0000010037c91a00 x13510731/t0 o35->b95b6ffc-
bc89-0c9b-bd0e-66f4a2e139fc at NET_0x20000c0a80165_UUID:0/0 lens 296/560
e 0 to 0 dl 1215063581 ref 1 fl Interpret:/0/0 rc 0/0
Jul  3 13:38:01 MDS_MASTER kernel: LustreError: 5151:0:(mds_open.c:
1482:mds_close()) @@@ no handle for file close ino 237489031: cookie
0xd2cddb8ca86ef3cd  req at 000001012fdd9050 x13510736/t0 o35->b95b6ffc-
bc89-0c9b-bd0e-66f4a2e139fc at NET_0x20000c0a80165_UUID:0/0 lens 296/560
e 0 to 0 dl 1215063581 ref 1 fl Interpret:/0/0 rc 0/0
Jul  3 13:38:29 MDS_MASTER kernel: Lustre: 5108:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 059b8837-8efc-
ace1-9a33-2af95b367a6a reconnecting
Jul  3 13:38:29 MDS_MASTER kernel: Lustre: 5108:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 15 previous similar messages
Jul  3 13:38:29 MDS_MASTER kernel: Lustre: 5108:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
059b8837-8efc-ace1-9a33-2af95b367a6a at 192.168.1.152@tcp to
0x000001011facf000; still busy with 2 active RPCs
Jul  3 13:38:29 MDS_MASTER kernel: Lustre: 5108:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 15 previous similar messages
Jul  3 13:38:45 MDS_MASTER kernel: LustreError: 5151:0:(mds_open.c:
1482:mds_close()) @@@ no handle for file close ino 237487439: cookie
0xd2cddb8ca873056f  req at 00000100bfeaca00 x13511041/t0 o35->b95b6ffc-
bc89-0c9b-bd0e-66f4a2e139fc at NET_0x20000c0a80165_UUID:0/0 lens 296/560
e 0 to 0 dl 1215063625 ref 1 fl Interpret:/0/0 rc 0/0
Jul  3 13:38:56 MDS_MASTER kernel: LustreError: 1099:0:(mds_open.c:
1482:mds_close()) @@@ no handle for file close ino 237487440: cookie
0xd2cddb8ca873097b  req at 00000100bfd1be00 x13511106/t0 o35->b95b6ffc-
bc89-0c9b-bd0e-66f4a2e139fc at NET_0x20000c0a80165_UUID:0/0 lens 296/560
e 0 to 0 dl 1215063636 ref 1 fl Interpret:/0/0 rc 0/0
Jul  3 13:40:09 MDS_MASTER kernel: LustreError: 5089:0:(ldlm_request.c:
69:ldlm_expired_completion_wait()) ### lock timed out (enqueued at
1215063509, 100s ago); not entering recovery in server code, just
going back to sleep ns: mds-lenovo-MDT0000_UUID lock:
0000010091e04280/0xd2cddb8ca88a8594 lrc: 3/1,0 mode: --/CR res:
237470160/1835281226 bits 0x3 rrc: 8 type: IBT flags: 4004000 remote:
0x0 expref: -99 pid 5089
Jul  3 13:41:49 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Watchdog triggered for pid 5089: it was inactive for
200s
Jul  3 13:41:49 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) showing stack for process 5089
Jul  3 13:41:49 MDS_MASTER kernel: ll_mdt_03     S
0000000000000000     0  5089      1          5090  5088 (L-TLB)
Jul  3 13:41:49 MDS_MASTER kernel: 000001010f6f1338 0000000000000046
0000000000000246 ffffffff00000074
Jul  3 13:41:49 MDS_MASTER kernel:        0000010139d37380
0000000001f27fa2 0000010001055a20 0000000291e04280
Jul  3 13:41:49 MDS_MASTER kernel:        000001013a1f3800
0000000000003db6
Jul  3 13:41:49 MDS_MASTER kernel: Call
Trace:<ffffffffa02e7b9c>{:ptlrpc:ldlm_expired_completion_wait+236}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02cf0bc>{:ptlrpc:lock_res_and_lock+188}
<ffffffffa02e81fd>{:ptlrpc:ldlm_completion_ast+1117}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffff80133804>{default_wake_function+0}
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02e8a98>{:ptlrpc:ldlm_cli_enqueue_local+1272}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffff801872c5>{__lookup_hash+284}
<ffffffffa0565d1a>{:mds:enqueue_ordered_locks+1082}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02ebba0>{:ptlrpc:ldlm_blocking_ast+0}
<ffffffffa02e7da0>{:ptlrpc:ldlm_completion_ast+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa0567160>{:mds:mds_get_parent_child_locked+1408}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa0550913>{:mds:mds_getattr_lock+1539}
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02250a0>{:lnet:lnet_send+2544}
<ffffffffa03124c7>{:ptlrpc:lustre_msg_get_flags+87}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa055bd41>{:mds:mds_intent_policy+1617}
<ffffffffa027811e>{:obdclass:class_handle_unhash_nolock+270}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffff801612ed>{cache_flusharray+107}
<ffffffff80131bc7>{recalc_task_prio+337}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:41:49 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:41:50 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:41:50 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:41:50 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215063709.5089
Jul  3 13:45:03 MDS_MASTER kernel: Lustre: lenovo-MDT0000: haven't
heard from client 059b8837-8efc-ace1-9a33-2af95b367a6a (at
192.168.1.152 at tcp) in 244 seconds. I think it's dead, and I am
evicting it.
Jul  3 13:45:03 MDS_MASTER kernel: Lustre: Skipped 1 previous similar
message
Jul  3 13:45:31 MDS_MASTER kernel: LustreError: 5099:0:(ldlm_request.c:
69:ldlm_expired_completion_wait()) ### lock timed out (enqueued at
1215063831, 100s ago); not entering recovery in server code, just
going back to sleep ns: mds-lenovo-MDT0000_UUID lock:
0000010126990280/0xd2cddb8ca88a8af0 lrc: 3/1,0 mode: --/CR res:
237470160/1835281226 bits 0x3 rrc: 8 type: IBT flags: 4004000 remote:
0x0 expref: -99 pid 5099
Jul  3 13:45:59 MDS_MASTER kernel: LustreError: 5094:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 000001013b425a00 x95167264/t0 o38-
>9266e751-95bd-3869-90a8-67ec125c66b1 at NET_0x20000c0a80197_UUID:0/0
lens 304/200 e 0 to 0 dl 1215064059 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:45:59 MDS_MASTER kernel: LustreError: 5094:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 24 previous similar messages
Jul  3 13:47:11 MDS_MASTER kernel: Lustre: 0:0:(watchdog.c:
130:lcw_cb()) Watchdog triggered for pid 5099: it was inactive for
200s
Jul  3 13:47:11 MDS_MASTER kernel: Lustre: 0:0:(linux-debug.c:
167:libcfs_debug_dumpstack()) showing stack for process 5099
Jul  3 13:47:11 MDS_MASTER kernel: ll_mdt_13     S
0000000000000000     0  5099      1          5100  5098 (L-TLB)
Jul  3 13:47:11 MDS_MASTER kernel: 000001010f495338 0000000000000046
0000000000000246 ffffffff8013f734
Jul  3 13:47:11 MDS_MASTER kernel:        000001013a733ac0
0000000101f2fd5b ffffffffffffffff 0000000326990280
Jul  3 13:47:11 MDS_MASTER kernel:        000001012fbf1030
0000000000003f7a
Jul  3 13:47:11 MDS_MASTER kernel: Call
Trace:<ffffffff8013f734>{__mod_timer+293}
<ffffffffa02cf0bc>{:ptlrpc:lock_res_and_lock+188}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02e81fd>{:ptlrpc:ldlm_completion_ast+1117}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffff80133804>{default_wake_function+0}
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02e7ab0>{:ptlrpc:ldlm_expired_completion_wait+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02e7aa0>{:ptlrpc:interrupted_completion_wait+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02e8a98>{:ptlrpc:ldlm_cli_enqueue_local+1272}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffff801872c5>{__lookup_hash+284}
<ffffffffa0565d1a>{:mds:enqueue_ordered_locks+1082}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02ebba0>{:ptlrpc:ldlm_blocking_ast+0}
<ffffffffa02e7da0>{:ptlrpc:ldlm_completion_ast+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa0567160>{:mds:mds_get_parent_child_locked+1408}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa0550913>{:mds:mds_getattr_lock+1539}
<ffffffffa030f3c3>{:ptlrpc:lustre_msg_add_version+83}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02250a0>{:lnet:lnet_send+2544}
<ffffffffa03124c7>{:ptlrpc:lustre_msg_get_flags+87}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa055bd41>{:mds:mds_intent_policy+1617}
<ffffffffa027811e>{:obdclass:class_handle_unhash_nolock+270}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02d8db3>{:ptlrpc:ldlm_resource_putref+435}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02d5be2>{:ptlrpc:ldlm_lock_enqueue+386}
<ffffffffa02d033d>{:ptlrpc:ldlm_lock_create+1469}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02f2682>{:ptlrpc:ldlm_handle_enqueue+3170}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02f0210>{:ptlrpc:ldlm_server_blocking_ast+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02f08c0>{:ptlrpc:ldlm_server_completion_ast+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa0556eb8>{:mds:mds_handle+19304}
<ffffffffa02ff162>{:ptlrpc:ptlrpc_prep_set+242}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02d3054>{:ptlrpc:ldlm_run_cp_ast_work+356}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02d3a20>{:ptlrpc:ldlm_reprocess_all+400}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02ea041>{:ptlrpc:ldlm_cli_cancel_local+609}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa00251a0>{:sd_mod:sd_iostats_start_req+270}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa02eb93f>{:ptlrpc:ldlm_cli_cancel+383}
<ffffffffa0225b48>{:lnet:lnet_match_blocked_msg+920}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffff801612ed>{cache_flusharray+107}
<ffffffff80131bc7>{recalc_task_prio+337}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa0278290>{:obdclass:class_handle2object+224}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa01ee45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:47:11 MDS_MASTER kernel:
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0}
Jul  3 13:47:11 MDS_MASTER kernel:        <ffffffff80110de3>{child_rip
+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0}
Jul  3 13:47:11 MDS_MASTER kernel:        <ffffffff80110ddb>{child_rip
+0}
Jul  3 13:47:11 MDS_MASTER kernel: LustreError: dumping log to /tmp/
lustre-log.1215064031.5099
Jul  3 13:47:47 MDS_MASTER sshd(pam_unix)[6176]: session opened for
user root by root(uid=0)
Jul  3 13:49:41 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 68afa71f-68f2-
e44a-1060-773f13efccf9 reconnecting
Jul  3 13:49:41 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 22 previous similar messages
Jul  3 13:49:41 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
68afa71f-68f2-e44a-1060-773f13efccf9 at 192.168.1.152@tcp to
0x000001009056c000; still busy with 2 active RPCs
Jul  3 13:49:41 MDS_MASTER kernel: Lustre: 5111:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 22 previous similar messages
Jul  3 13:56:08 MDS_MASTER kernel: LustreError: 5105:0:(ldlm_lib.c:
1536:target_send_reply_msg()) @@@ processing error (-16)
req at 000001013b403a00 x1098851/t0 o38->2466a067-c120-a833-a490-
e3347ac3fe82 at NET_0x20000c0a801fb_UUID:0/0 lens 304/200 e 0 to 0 dl
1215064668 ref 1 fl Interpret:/0/0 rc -16/0
Jul  3 13:56:08 MDS_MASTER kernel: LustreError: 5105:0:(ldlm_lib.c:
1536:target_send_reply_msg()) Skipped 21 previous similar messages
Jul  3 13:59:42 MDS_MASTER kernel: Lustre: 5090:0:(ldlm_lib.c:
525:target_handle_reconnect()) lenovo-MDT0000: 4bb3de9c-b9b2-e919-6eb4-
ea264525d5b7 reconnecting
Jul  3 13:59:42 MDS_MASTER kernel: Lustre: 5090:0:(ldlm_lib.c:
525:target_handle_reconnect()) Skipped 23 previous similar messages
Jul  3 13:59:42 MDS_MASTER kernel: Lustre: 5090:0:(ldlm_lib.c:
760:target_handle_connect()) lenovo-MDT0000: refuse reconnection from
4bb3de9c-b9b2-e919-6eb4-ea264525d5b7 at 192.168.1.102@tcp to
0x000001011baa8000; still busy with 2 active RPCs
Jul  3 13:59:42 MDS_MASTER kernel: Lustre: 5090:0:(ldlm_lib.c:
760:target_handle_connect()) Skipped 23 previous similar messages

Bellow is /var/log/messges of OSS1_MASTER:
Jul  3 11:20:24 OSS1_MASTER kernel: LustreError: 5806:0:
(ldlm_resource.c:767:ldlm_resource_add()) lvbo_init failed for
resource 299349: rc -2
Jul  3 11:53:19 OSS1_MASTER kernel: LustreError: 5843:0:
(ldlm_resource.c:767:ldlm_resource_add()) lvbo_init failed for
resource 303694: rc -2
Jul  3 13:09:59 OSS1_MASTER kernel: Lustre: Request x12538541 sent
from lenovo-OST0000 to NID 192.168.1.101 at tcp 20s ago has timed out
(limit 20s).
Jul  3 13:09:59 OSS1_MASTER kernel: Lustre: Skipped 1 previous similar
message
Jul  3 13:09:59 OSS1_MASTER kernel: LustreError: 138-a: lenovo-
OST0000: A client on nid 192.168.1.101 at tcp was evicted due to a lock
blocking callback to 192.168.1.101 at tcp timed out: rc -107
Jul  3 13:10:03 OSS1_MASTER kernel: Lustre: Request x12538893 sent
from lenovo-OST0003 to NID 192.168.1.101 at tcp 20s ago has timed out
(limit 20s).
Jul  3 13:10:03 OSS1_MASTER kernel: Lustre: Skipped 2 previous similar
messages
Jul  3 13:10:03 OSS1_MASTER kernel: LustreError: 138-a: lenovo-
OST0003: A client on nid 192.168.1.101 at tcp was evicted due to a lock
blocking callback to 192.168.1.101 at tcp timed out: rc -107
Jul  3 13:10:03 OSS1_MASTER kernel: LustreError: Skipped 1 previous
similar message
Jul  3 13:10:04 OSS1_MASTER kernel: LustreError: 138-a: lenovo-
OST0002: A client on nid 192.168.1.101 at tcp was evicted due to a lock
blocking callback to 192.168.1.101 at tcp timed out: rc -107
Jul  3 13:10:12 OSS1_MASTER kernel: Lustre: Request x12539311 sent
from lenovo-OST0000 to NID 192.168.1.101 at tcp 20s ago has timed out
(limit 20s).
Jul  3 13:10:12 OSS1_MASTER kernel: Lustre: Skipped 6 previous similar
messages
Jul  3 13:10:20 OSS1_MASTER kernel: Lustre: Request x12539730 sent
from lenovo-OST0002 to NID 192.168.1.101 at tcp 20s ago has timed out
(limit 20s).
Jul  3 13:10:20 OSS1_MASTER kernel: Lustre: Skipped 1 previous similar
message
Jul  3 13:10:56 OSS1_MASTER kernel: LustreError: 5657:0:(acceptor.c:
442:lnet_acceptor()) Error -11 reading connection request from
192.168.1.101
Jul  3 13:14:40 OSS1_MASTER kernel: Lustre: lenovo-OST0000: haven't
heard from client lenovo-mdtlov_UUID (at 192.168.1.2 at tcp) in 239
seconds. I think it's dead, and I am evicting it.



More information about the lustre-discuss mailing list