[Lustre-discuss] Kernel crash problem

Chan Ching Yu, Patrick cychan at clustertech.com
Sun Jul 21 15:46:40 PDT 2013


Hi,

We have been using Lustre-2.1.5 for few months. LNET device of all the MGS, MDS and OSS is a bonding interface bond1 with ib0 and ib1 as the slave interface (they are Infiniband). However, one client was rebooted with a kernel dump.

Anyone knows the problem?  Thanks in advance.

For your reference, I attach the backtrace and system message buffer in crash as follows:

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux
    DUMPFILE: vmcore-machine-2013-07-20-08:48:56  [PARTIAL DUMP]
        CPUS: 16
        DATE: Sat Jul 20 08:47:51 2013
      UPTIME: 17 days, 21:11:15
LOAD AVERAGE: 16.02, 16.02, 15.12
       TASKS: 676
     RELEASE: 2.6.32-279.19.1.el6_lustre.x86_64
     VERSION: #1 SMP Wed Mar 20 16:37:18 PDT 2013
     MACHINE: x86_64  (2599 Mhz)
      MEMORY: 64 GB
       PANIC: "kernel BUG at mm/filemap.c:129!"
         PID: 12891
     COMMAND: "ptlrpcd-rcv"
        TASK: ffff880863c8eaa0  [THREAD_INFO: ffff880869ef0000]
         CPU: 10
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 12891  TASK: ffff880863c8eaa0  CPU: 10  COMMAND: "ptlrpcd-rcv"
#0 [ffff880869ef14a0] machine_kexec at ffffffff81031f7b
#1 [ffff880869ef1500] crash_kexec at ffffffff810b8c22
#2 [ffff880869ef15d0] oops_end at ffffffff814ed980
#3 [ffff880869ef1600] die at ffffffff8100f19b
#4 [ffff880869ef1630] do_trap at ffffffff814ed274
#5 [ffff880869ef1690] do_invalid_op at ffffffff8100cdb5
#6 [ffff880869ef1730] invalid_op at ffffffff8100be5b
    [exception RIP: __remove_from_page_cache+213]
    RIP: ffffffff81110c85  RSP: ffff880869ef17e0  RFLAGS: 00010046
    RAX: 00c0000000000065  RBX: ffffea0030e5f6b0  RCX: 00000000ffffffd9
    RDX: 0000000000000000  RSI: 0000000000000009  RDI: ffff880880010dc0
    RBP: ffff880869ef17f0   R8: 000000000000005a   R9: 0000000000000001
    R10: 0000000000000357  R11: d000000000000000  R12: ffff881047e3e318
    R13: ffff881047e3e330  R14: ffff8810370c5f50  R15: ffff8810450b9980
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#7 [ffff880869ef17f8] remove_from_page_cache at ffffffff81110ce4
#8 [ffff880869ef1828] vvp_page_discard at ffffffffa0bfa2d0 [lustre]
#9 [ffff880869ef1858] cl_page_invoid at ffffffffa061cb08 [obdclass]
#10 [ffff880869ef18a8] cl_page_discard at ffffffffa061cc13 [obdclass]
#11 [ffff880869ef18b8] check_and_discard_cb at ffffffffa0626b0e [obdclass]
#12 [ffff880869ef1908] cl_page_gang_lookup at ffffffffa0621a34 [obdclass]
#13 [ffff880869ef19b8] cl_lock_page_out at ffffffffa06235db [obdclass]
#14 [ffff880869ef1a28] osc_lock_flush at ffffffffa0a9a9cf [osc]
#15 [ffff880869ef1a78] osc_lock_cancel at ffffffffa0a9aadf [osc]
#16 [ffff880869ef1ac8] cl_lock_cancel0 at ffffffffa0622265 [obdclass]
#17 [ffff880869ef1af8] cl_lock_cancel at ffffffffa0622f4b [obdclass]
#18 [ffff880869ef1b18] osc_ldlm_blocking_ast at ffffffffa0a9bbba [osc]
#19 [ffff880869ef1b88] ldlm_cancel_callback at ffffffffa0729cc0 [ptlrpc]
#20 [ffff880869ef1ba8] ldlm_lock_cancel at ffffffffa0729db5 [ptlrpc]
#21 [ffff880869ef1bc8] ldlm_cli_cancel_list_local at ffffffffa07469d8 [ptlrpc]
#22 [ffff880869ef1c28] ldlm_cancel_lru_local at ffffffffa0747895 [ptlrpc]
#23 [ffff880869ef1c48] ldlm_replay_locks at ffffffffa0747a0a [ptlrpc]
#24 [ffff880869ef1cc8] ptlrpc_import_recovery_state_machine at ffffffffa0790187 [ptlrpc]
#25 [ffff880869ef1d08] ptlrpc_connect_interpret at ffffffffa0793288 [ptlrpc]
#26 [ffff880869ef1d98] ptlrpc_check_set at ffffffffa0764cab [ptlrpc]
#27 [ffff880869ef1e38] ptlrpcd_check at ffffffffa0795f30 [ptlrpc]
#28 [ffff880869ef1e68] ptlrpcd at ffffffffa07962c3 [ptlrpc]
#29 [ffff880869ef1f48] kernel_thread at ffffffff8100c0ca
-------------------------------------------------------------------------------------------------------------------

crash> log
(skipped for reducing size of this mail)


Default coalesing params for mtu:2044 - rx_frames:88 rx_usecs:16
Default coalesing params for mtu:2044 - rx_frames:88 rx_usecs:16
802.1Q VLAN Support v1.8 Ben Greear <greearb at candelatech.com>
All bugs added by David S. Miller <davem at redhat.com>
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
bonding: MII link monitoring set to 100 ms
Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-bond1 instead
bonding: bond1: doing slave updates when interface is down.
bonding: bond1: Adding slave ib0.
bonding bond1: master_dev is not up in bond_enslave
bonding: bond1: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will be blocked as long as ib0 is part of bond bond1
bonding: bond1: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
New slave device ib0 does not support netpoll
Disabling netpoll support for bond1
bonding: bond1: enslaving ib0 as a backup interface with a down link.
ib0: enabling connected mode will cause multicast packet drops
ib0: mtu > 2044 will cause multicast packet drops.
ib0: mtu > 2044 will cause multicast packet drops.
bonding: bond1: doing slave updates when interface is down.
bonding: bond1: Adding slave ib1.
bonding bond1: master_dev is not up in bond_enslave
bonding: bond1: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond1
ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
bonding: bond1: enslaving ib1 as a backup interface with a down link.
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
ib1: enabling connected mode will cause multicast packet drops
ib1: mtu > 2044 will cause multicast packet drops.
ib1: mtu > 2044 will cause multicast packet drops.
ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
ADDRCONF(NETDEV_UP): bond0: link is not ready
8021q: adding VLAN 0 to HW filter on device bond0
bonding: bond0: Adding slave eth0.
8021q: adding VLAN 0 to HW filter on device eth0
bonding: bond0: enslaving eth0 as a backup interface with a down link.
bonding: bond0: Adding slave eth1.
8021q: adding VLAN 0 to HW filter on device eth1
bonding: bond0: enslaving eth1 as a backup interface with a down link.
ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: first active interface up!
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
ADDRCONF(NETDEV_UP): bond1: link is not ready
8021q: adding VLAN 0 to HW filter on device bond1
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Lustre: Lustre: Build Version: RC1-MLNX_OFED_LINUX_1.5.3_3.1.0_rhel6.3--PRISTINE-2.6.32-279.19.1.el6_lustre.x86_64
Lustre: Added LNI 192.168.127.106 at tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre OSC module (ffffffffa0a696a0).
Lustre: Lustre LOV module (ffffffffa0afba60).
Lustre: Lustre client module (ffffffffa0be5140).
bond0: no IPv6 routers present
Lustre: 2741:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1372736217/real 1372736220]  req at ffff88106393a000 x1439418250428417/t0(0) o250->MGC192.168.127.14 at tcp@192.168.127.14 at tcp:26/25 lens 368/512 e 0 to 1 dl 1372736222 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 2677:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ send limit expired   req at ffff881064880000 x1439418250428418/t0(0) o101->MGC192.168.127.14 at tcp@192.168.127.14 at tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
bonding: bond1: first active interface up!
bond1: link status definitely up for interface ib1, 4294967295 Mbps full duplex.
ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
LustreError: 2763:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ send limit expired   req at ffff881057fbc000 x1439418250428420/t0(0) o101->MGC192.168.127.14 at tcp@192.168.127.14 at tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
bond1: no IPv6 routers present
LustreError: 2763:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ send limit expired   req at ffff8810583c5000 x1439418250428421/t0(0) o101->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Lustre: 2741:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1372736242/real 1372736242]  req at ffff881057e8a800 x1439418250428422/t0(0) o250->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 368/512 e 0 to 1 dl 1372736247 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 2763:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ send limit expired   req at ffff8810583c5000 x1439418250428423/t0(0) o101->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 2763:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ send limit expired   req at ffff881064221000 x1439418250428424/t0(0) o101->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Lustre: MGC192.168.127.14 at tcp: Reactivating import
Lustre: Mounted modelfs-client
Lustre: Mounted preposfs-client
LustreError: 125620:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 125620:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: Unmounted modelfs-client
LustreError: 125645:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 125645:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Skipped 103 previous similar messages
LustreError: 125645:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
LustreError: 125645:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) Skipped 103 previous similar messages
Lustre: Unmounted preposfs-client
Slow work thread pool: Starting up
Slow work thread pool: Ready
FS-Cache: Loaded
Registering the id_resolver key type
FS-Cache: Netfs 'nfs' registered for caching
Lustre: Removed LNI 192.168.127.106 at tcp
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
xhpl_intel64[1022]: segfault at 61ac020 ip 00000000007eca81 sp 00007f4af0254d50 error 4 in xhpl_intel64[400000+618000]
xhpl_intel64[1030]: segfault at 5ef7048 ip 00000000007eca81 sp 00007fa1b9b38d50 error 4 in xhpl_intel64[400000+618000]
xhpl_intel64[1035]: segfault at 6099008 ip 00000000007eca81 sp 00007f5c7c293d50 error 4 in xhpl_intel64[400000+618000]
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
xhpl_intel64[2963]: segfault at 5b4b058 ip 00000000007eca81 sp 00007fe6f4b8cd50 error 4 in xhpl_intel64[400000+618000]
xhpl_intel64[2971]: segfault at 49ce028 ip 00000000007eca81 sp 00007f0416c90d50 error 4 in xhpl_intel64[400000+618000]
xhpl_intel64[2974]: segfault at 4d29048 ip 00000000007eca81 sp 00007f9d95e51d50 error 4
in xhpl_intel64[400000+618000]
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
reached max retry count. status=-22  .Giving up
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
reached max retry count. status=-22  .Giving up
bonding: bond1: link status definitely down for interface ib0, disabling it
bonding: bond1: making interface ib1 the new active one.
bond1: link status definitely up for interface ib0, 4294967295 Mbps full duplex.
bonding: bond1: making interface ib0 the new active one.
Lustre: Lustre: Build Version: RC1-MLNX_OFED_LINUX_1.5.3_3.1.0_rhel6.3--PRISTINE-2.6.32-279.19.1.el6_lustre.x86_64
Lustre: Added LNI 192.168.127.106 at tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre OSC module (ffffffffa0aae6a0).
Lustre: Lustre LOV module (ffffffffa0b40a60).
Lustre: Lustre client module (ffffffffa0c2a140).
Lustre: MGC192.168.127.14 at tcp: Reactivating import
Lustre: Mounted modelfs-client
Lustre: Mounted preposfs-client
INFO: task ungrib.exe:94999 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000000     0 94999  94977 0x00000080
ffff881036799b38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff88076e689600 ffff88086a2c7c00 ffff881036799b18 0000000000000282
ffff881065036638 ffff881036799fd8 000000000000fb88 ffff881065036638
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95001 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000001     0 95001  94977 0x00000080
ffff8810290e9b38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff880785e0a400 ffff88086a2c7c00 ffff8810290e9b18 0000000000000286
ffff8810657c8638 ffff8810290e9fd8 000000000000fb88 ffff8810657c8638
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95002 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 000000000000000a     0 95002  94977 0x00000080
ffff881057dd7b38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff88105c9ba000 ffff88086a2c7c00 ffff881057dd7b18 0000000000000286
ffff881063079ab8 ffff881057dd7fd8 000000000000fb88 ffff881063079ab8
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95004 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000009     0 95004  94977 0x00000080
ffff8810360e1b38 0000000000000086 0000000000000000 ffff88105c6ac588
ffff88105dd50a00 ffff88086a2c7c00 ffff8810360e1b18 0000000000000286
ffff881063ab3ab8 ffff8810360e1fd8 000000000000fb88 ffff881063ab3ab8
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95005 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 000000000000000d     0 95005  94977 0x00000080
ffff8810657c5b38 0000000000000086 0000000000000000 ffff88105c6ac588
ffff881047834400 ffff88086a2c7c00 ffff8810657c5b18 0000000000000282
ffff8810650125f8 ffff8810657c5fd8 000000000000fb88 ffff8810650125f8
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95006 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000004     0 95006  94977 0x00000080
ffff8810581f3b38 0000000000000086 0000000000000000 ffff88105c6ac588
ffff8807e191ac00 ffff88086a2c7c00 ffff8810581f3b18 0000000000000286
ffff8810654c7ab8 ffff8810581f3fd8 000000000000fb88 ffff8810654c7ab8
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95007 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000009     0 95007  94977 0x00000080
ffff8810564d7b38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff881027004e00 ffff88086a2c7c00 ffff8810564d7b18 0000000000000286
ffff8810654c7058 ffff8810564d7fd8 000000000000fb88 ffff8810654c7058
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95008 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000007     0 95008  94977 0x00000080
ffff88105b52db38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff880863e03c00 ffff88086a2c7c00 ffff88105b52db18 0000000000000282
ffff881064253ab8 ffff88105b52dfd8 000000000000fb88 ffff881064253ab8
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95009 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 000000000000000b     0 95009  94977 0x00000080
ffff881058a6fb38 0000000000000086 0000000000000000 ffff88105c6ac588
ffff88105cb25200 ffff88086a2c7c00 ffff881058a6fb18 0000000000000286
ffff8810641b0638 ffff881058a6ffd8 000000000000fb88 ffff8810641b0638
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff8100c535>] ? math_state_restore+0x45/0x60
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task ungrib.exe:95010 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ungrib.exe    D 0000000000000006     0 95010  94977 0x00000080
ffff881058a61b38 0000000000000082 0000000000000000 ffff88105c6ac588
ffff88076e539400 ffff88086a2c7c00 ffff881058a61b18 0000000000000286
ffff8810641b1098 ffff881058a61fd8 000000000000fb88 ffff8810641b1098
Call Trace:
[<ffffffff814ec485>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ec5e3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff812754e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ebae2>] ? down_write+0x32/0x40
[<ffffffffa0bc801c>] ll_setattr_raw+0x48c/0xf30 [lustre]
[<ffffffffa0bc8b1f>] ll_setattr+0x5f/0x100 [lustre]
[<ffffffff811923a8>] notify_change+0x168/0x340
[<ffffffff81174c74>] do_truncate+0x64/0xa0
[<ffffffff811873b9>] do_filp_open+0x829/0xd60
[<ffffffff81140540>] ? unmap_region+0x110/0x130
[<ffffffff81193282>] ? alloc_fd+0x92/0x160
[<ffffffff81173a39>] do_sys_open+0x69/0x140
[<ffffffff81173b50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
nfs: server avmmst1a not responding, still trying
nfs: server avmmst1a OK
ipmi message handler version 39.2
IPMI System Interface driver.
ipmi_si: Adding SMBIOS-specified kcs state machine
ipmi_si: Adding ACPI-specified kcs state machine: duplicate interface
ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 0
ipmi: Found new BMC (man_id: 0x002b99,  prod_id: 0x0083, dev_id: 0x20)
IPMI kcs interface initialized
ipmi device interface
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373509742/real 1373509742]  req at ffff881055d9e400 x1439531544675803/t0(0) o400->MGC192.168.127.14 at tcp@192.168.127.14 at tcp:26/25 lens 192/192 e 0 to 1 dl 1373509749 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 166-1: MGC192.168.127.14 at tcp: Connection to MGS (at 192.168.127.14 at tcp) was lost; in progress operations using this service will fail
Lustre: modelfs-MDT0000-mdc-ffff8808641b8400: Connection to modelfs-MDT0000 (at 192.168.127.14 at tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373509749/real 1373509749]  req at ffff880f712d9000 x1439531544675818/t0(0) o250->MGC192.168.127.14 at tcp@192.168.127.14 at tcp:26/25 lens 368/512 e 0 to 1 dl 1373509755 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Lustre: Evicted from MGS (at 192.168.127.15 at tcp) after server handle changed from 0xbb9b8e06bf5d917c to 0x71c19737ef0868b4
Lustre: MGC192.168.127.14 at tcp: Reactivating import
Lustre: MGC192.168.127.14 at tcp: Connection restored to MGS (at 192.168.127.15 at tcp)
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373509774/real 1373509774]  req at ffff881065606800 x1439531544675822/t0(0) o38->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.15@tcp:12/10 lens 368/512 e 0 to 1 dl 1373509780 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373509799/real 1373509799]  req at ffff8810461a3000 x1439531544675848/t0(0) o38->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.14@tcp:12/10 lens 368/512 e 0 to 1 dl 1373509810 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 1 previous similar message
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req at ffff88105cf81400 x1439531543885079/t73014918113(73014918113) o101->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.15@tcp:12/10 lens 752/544 e 0 to 0 dl 1373509872 ref 2 fl Interpret:RP/4/0 rc 301/301
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req at ffff881036340400 x1439531543930893/t73015503064(73015503064) o101->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.15@tcp:12/10 lens 504/544 e 0 to 0 dl 1373509872 ref 2 fl Interpret:RP/4/0 rc 301/301
Lustre: modelfs-MDT0000-mdc-ffff8808641b8400: Connection restored to modelfs-MDT0000 (at 192.168.127.15 at tcp)
Lustre: preposfs-MDT0000-mdc-ffff880868cf8000: Connection restored to preposfs-MDT0000 (at 192.168.127.15 at tcp)
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373537099/real 1373537099]  req at ffff880447119400 x1439531544709159/t0(0) o400->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 192/192 e 0 to 1 dl 1373537106 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 1 previous similar message
LustreError: 166-1: MGC192.168.127.14 at tcp: Connection to MGS (at 192.168.127.15 at tcp) was lost; in progress operations using this service will fail
Lustre: modelfs-MDT0000-mdc-ffff8808641b8400: Connection to modelfs-MDT0000 (at 192.168.127.15 at tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1373537106/real 1373537106]  req at ffff88086917d000 x1439531544709174/t0(0) o250->MGC192.168.127.14 at tcp@192.168.127.15 at tcp:26/25 lens 368/512 e 0 to 1 dl 1373537112 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Lustre: Evicted from MGS (at 192.168.127.14 at tcp) after server handle changed from 0x71c19737ef0868b4 to 0xbb9b8e06cdc26fc6
Lustre: MGC192.168.127.14 at tcp: Reactivating import
Lustre: MGC192.168.127.14 at tcp: Connection restored to MGS (at 192.168.127.14 at tcp)
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req at ffff880869a3b400 x1439531544676036/t77309411388(77309411388) o101->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.14@tcp:12/10 lens 504/544 e 0 to 0 dl 1373537174 ref 2 fl Interpret:RP/4/0 rc 301/301
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) Skipped 51 previous similar messages
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) @@@ status -116, old was 0  req at ffff8810419f8c00 x1439531544708969/t77309662829(77309662829) o35->modelfs-MDT0000-mdc-ffff8808641b8400 at 192.168.127.14@tcp:23/10 lens 360/424 e 0 to 0 dl 1373537150 ref 2 fl Interpret:R/4/0 rc -116/-116
LustreError: 12891:0:(client.c:2631:ptlrpc_replay_interpret()) Skipped 111 previous similar messages
Lustre: modelfs-MDT0000-mdc-ffff8808641b8400: Connection restored to modelfs-MDT0000 (at 192.168.127.14 at tcp)
Lustre: preposfs-MDT0000-mdc-ffff880868cf8000: Connection restored to preposfs-MDT0000 (at 192.168.127.14 at tcp)
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
nfs: server avmmst1a not responding, timed out
nfs: server avmmst1a not responding, timed out
Error: state manager failed on NFSv4 server avmmst1a with error 5
sendrecv-blocki[19369]: segfault at 7fff9c4e0288 ip 0000000000405220 sp 00007fff9c4e0290 error 6 in sendrecv-blocking.x[400000+1da000]
sendrecv-blocki[19470]: segfault at 7fff47ae8308 ip 0000000000405220 sp 00007fff47ae8310 error 6 in sendrecv-blocking.x[400000+1da000]
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374141957/real 1374141957]  req at ffff88103bfc3800 x1439531545346894/t0(0) o400->modelfs-OST0001-osc-ffff8808641b8400 at 192.168.127.18@tcp:28/4 lens 192/192 e 0 to 1 dl 1374141964 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Lustre: modelfs-OST0001-osc-ffff8808641b8400: Connection to modelfs-OST0001 (at 192.168.127.18 at tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 1 previous similar message
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374141964/real 1374141964]  req at ffff88103e45d400 x1439531545346906/t0(0) o8->modelfs-OST0001-osc-ffff8808641b8400 at 192.168.127.18@tcp:28/4 lens 368/512 e 0 to 1 dl 1374141970 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
LustreError: 11-0: an error occurred while communicating with 192.168.127.17 at tcp. The ost_connect operation failed with -19
LustreError: 11-0: an error occurred while communicating with 192.168.127.17 at tcp. The ost_connect operation failed with -19
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374142014/real 1374142017]  req at ffff88102708e000 x1439531545346927/t0(0) o8->modelfs-OST0001-osc-ffff8808641b8400 at 192.168.127.18@tcp:28/4 lens 368/512 e 0 to 1 dl 1374142025 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
LustreError: 11-0: an error occurred while communicating with 192.168.127.17 at tcp. The ost_connect operation failed with -19
LustreError: Skipped 1 previous similar message
Lustre: preposfs-OST0000-osc-ffff880868cf8000: Connection restored to preposfs-OST0000 (at 192.168.127.17 at tcp)
Lustre: modelfs-OST0001-osc-ffff8808641b8400: Connection restored to modelfs-OST0001 (at 192.168.127.17 at tcp)
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374142064/real 1374142067]  req at ffff881039cde800 x1439531545347499/t0(0) o8->modelfs-OST0005-osc-ffff8808641b8400 at 192.168.127.18@tcp:28/4 lens 368/512 e 0 to 1 dl 1374142080 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Lustre: modelfs-OST0005-osc-ffff8808641b8400: Connection restored to modelfs-OST0005 (at 192.168.127.17 at tcp)
Lustre: 12890:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281089/real 1374281089]  req at ffff88105be87400 x1439531545475212/t0(0) o400->modelfs-OST0000-osc-ffff8808641b8400 at 192.168.127.17@tcp:28/4 lens 192/192 e 0 to 1 dl 1374281096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: modelfs-OST0000-osc-ffff8808641b8400: Connection to modelfs-OST0000 (at 192.168.127.17 at tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281096/real 1374281096]  req at ffff881051ba7c00 x1439531545475225/t0(0) o8->modelfs-OST0000-osc-ffff8808641b8400 at 192.168.127.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1374281102 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Lustre: There was an unexpected network error while writing to 192.168.127.17: -110.
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281121/real 1374281121]  req at ffff88105838c800 x1439531545475233/t0(0) o8->modelfs-OST0000-osc-ffff8808641b8400 at 192.168.127.18@tcp:28/4 lens 368/512 e 0 to 1 dl 1374281127 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281146/real 1374281148]  req at ffff88103feb0800 x1439531545475248/t0(0) o8->modelfs-OST0000-osc-ffff8808641b8400 at 192.168.127.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1374281157 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
LustreError: 11-0: an error occurred while communicating with 192.168.127.18 at tcp. The ost_connect operation failed with -19
Lustre: preposfs-OST0000-osc-ffff880868cf8000: Connection restored to preposfs-OST0000 (at 192.168.127.18 at tcp)
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281196/real 1374281196]  req at ffff88104cc68000 x1439531545475714/t0(0) o8->modelfs-OST0004-osc-ffff8808641b8400 at 192.168.127.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1374281212 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Lustre: modelfs-OST0000-osc-ffff8808641b8400: Connection restored to modelfs-OST0000 (at 192.168.127.18 at tcp)
LustreError: 11-0: an error occurred while communicating with 192.168.127.18 at tcp. The ost_connect operation failed with -19
LustreError: Skipped 2 previous similar messages
Lustre: modelfs-OST0001-osc-ffff8808641b8400: Connection restored to modelfs-OST0001 (at 192.168.127.18 at tcp)
Lustre: modelfs-OST0005-osc-ffff8808641b8400: Connection restored to modelfs-OST0005 (at 192.168.127.18 at tcp)
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1374281246/real 1374281249]  req at ffff88105cbe0800 x1439531545476171/t0(0) o8->modelfs-OST0008-osc-ffff8808641b8400 at 192.168.127.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1374281267 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 12891:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
------------[ cut here ]------------
kernel BUG at mm/filemap.c:129!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:80/0000:80:02.2/0000:82:00.0/infiniband_mad/umad1/port
CPU 10 
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) bonding 8021q garp stp llc ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) iw_nes(U) libcrc32c iw_cxgb3(U) cxgb3(U) mlx4_ib(U) ib_mthca(U) ib_mad(U) ib_core(U) dcdbas microcode mlx4_en(U) mlx4_core(U) sb_edac edac_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp sg ioatdma igb dca ext4 mbcache jbd2 sd_mod crc_t10dif wmi ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]

Pid: 12891, comm: ptlrpcd-rcv Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1 Dell Inc. PowerEdge C6220/0TTH1R
RIP: 0010:[<ffffffff81110c85>]  [<ffffffff81110c85>] __remove_from_page_cache+0xd5/0xe0
RSP: 0018:ffff880869ef17e0  EFLAGS: 00010046
RAX: 00c0000000000065 RBX: ffffea0030e5f6b0 RCX: 00000000ffffffd9
RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff880880010dc0
RBP: ffff880869ef17f0 R08: 000000000000005a R09: 0000000000000001
R10: 0000000000000357 R11: d000000000000000 R12: ffff881047e3e318
R13: ffff881047e3e330 R14: ffff8810370c5f50 R15: ffff8810450b9980
FS:  00007fd238319700(0000) GS:ffff88089c440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000033430abc30 CR3: 0000001050a99000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ptlrpcd-rcv (pid: 12891, threadinfo ffff880869ef0000, task ffff880863c8eaa0)
Stack:
ffffea0030e5f6b0 0000000000000000 ffff880869ef1820 ffffffff81110ce4
<d> ffffc9002cd0a000 ffffea0030e5f6b0 ffff8810370c5f50 ffff881047e3e318
<d> ffff880869ef1850 ffffffffa0bfa2d0 ffff8810450b98c0 ffff88046f3f8428
Call Trace:
[<ffffffff81110ce4>] remove_from_page_cache+0x54/0x90
[<ffffffffa0bfa2d0>] vvp_page_discard+0x70/0x130 [lustre]
[<ffffffffa061cb08>] cl_page_invoid+0x78/0x170 [obdclass]
[<ffffffffa061d058>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
[<ffffffffa061cc13>] cl_page_discard+0x13/0x20 [obdclass]
[<ffffffffa0626b0e>] check_and_discard_cb+0x13e/0x160 [obdclass]
[<ffffffffa0621a34>] cl_page_gang_lookup+0x1d4/0x3e0 [obdclass]
[<ffffffffa06269d0>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[<ffffffffa06235db>] cl_lock_page_out+0x15b/0x330 [obdclass]
[<ffffffffa06269d0>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[<ffffffffa0a9a9cf>] osc_lock_flush+0x4f/0x90 [osc]
[<ffffffffa0a9aadf>] osc_lock_cancel+0xcf/0x1b0 [osc]
[<ffffffffa061c48d>] ? cl_env_nested_get+0x5d/0xc0 [obdclass]
[<ffffffffa0622265>] cl_lock_cancel0+0x75/0x160 [obdclass]
[<ffffffffa0622f4b>] cl_lock_cancel+0x13b/0x140 [obdclass]
[<ffffffffa0a9bbba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
[<ffffffffa0729cc0>] ldlm_cancel_callback+0x60/0x100 [ptlrpc]
[<ffffffffa0729db5>] ldlm_lock_cancel+0x55/0x1b0 [ptlrpc]
[<ffffffffa07469d8>] ldlm_cli_cancel_list_local+0x78/0x1f0 [ptlrpc]
[<ffffffffa0747895>] ldlm_cancel_lru_local+0x35/0x40 [ptlrpc]
[<ffffffffa0747a0a>] ldlm_replay_locks+0x16a/0x6e0 [ptlrpc]
[<ffffffffa0790187>] ptlrpc_import_recovery_state_machine+0x8e7/0xc20 [ptlrpc]
[<ffffffffa0793288>] ptlrpc_connect_interpret+0x578/0x1f20 [ptlrpc]
[<ffffffffa0760710>] ? after_reply+0x730/0xe60 [ptlrpc]
[<ffffffffa0764cab>] ptlrpc_check_set+0x29b/0x1b00 [ptlrpc]
[<ffffffff814ead1a>] ? schedule_timeout+0x19a/0x2e0
[<ffffffff8107cb50>] ? process_timeout+0x0/0x10
[<ffffffffa0795f30>] ptlrpcd_check+0x1a0/0x230 [ptlrpc]
[<ffffffffa07962c3>] ptlrpcd+0x303/0x370 [ptlrpc]
[<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[<ffffffffa0795fc0>] ? ptlrpcd+0x0/0x370 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa0795fc0>] ? ptlrpcd+0x0/0x370 [ptlrpc]
[<ffffffffa0795fc0>] ? ptlrpcd+0x0/0x370 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 00 00 e8 2f c8 16 00 48 89 df 57 9d 0f 1f 44 00 00 5b 41 5c c9 c3 be 16 00 00 00 48 89 df e8 f3 1c 02 00 48 8b 03 e9 71 ff ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 
RIP  [<ffffffff81110c85>] __remove_from_page_cache+0xd5/0xe0
RSP <ffff880869ef17e0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130722/ede3afa7/attachment.htm>


More information about the lustre-discuss mailing list