<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
<span class="Apple-style-span"
 style="border-collapse: separate; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;"><span
 class="Apple-style-span"
 style="font-family: Verdana,Arial,sans-serif; font-size: 11px;">We
experienced an nfs lockup on the Lustre client exporting the
filesystem.   nfs is tuned to run 256 daemons in anticipation of heavy
load.  The Client is connected with bonded 10Gbps ethernet to the NFS
export network, and also bonded 10Gbps ethernet to the Lustre network. 
At the time of the crash, approx 800Mbps of NFS writes to the Lustre
filesystem was taking place.  Any input on on this issue and most
importantly how to prevent it is appreciated!<br>
<br>
Vitals:<br>
2.6.18-164.11.1.el5_lustre.1.8.2<br>
CentOS release 5.4<br>
96GB RAM<br>
Dual intel quad<br>
<br>
Messages:<br>
<br>
Jul 21 14:25:36 ID6316-Client1 kernel: LustreError:
5464:0:(lov_merge.c:74:lov_merge_lvb())
ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed<br>
Jul 21 14:25:36 ID6316-Client1 kernel: LustreError:
5464:0:(lov_merge.c:74:lov_merge_lvb()) LBUG<br>
Jul 21 14:25:36 ID6316-Client1 kernel: Lustre:
5464:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for
process 5464<br>
Jul 21 14:25:36 ID6316-Client1 kernel: nfsd          R  running
task       0  5464      1          5465  5463 (L-TLB)<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  0000000000000000
0000000000000000 0000000000000001 0000000000000000<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  0000000000000000
0000000000000000 0000000000000086 ffffffff80047152<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  0000000000000001
0000000000000001 0000000000000000 0000000000000001<br>
Jul 21 14:25:36 ID6316-Client1 kernel: Call Trace:<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff80047152>]
try_to_wake_up+0x472/0x484<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8016d94c>]
vgacon_scroll+0x21e/0x23f<br>
Jul 21 14:25:36 ID6316-Client1 last message repeated 2 times<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff80091d2d>]
printk+0x52/0xbd<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8008ac95>]
__wake_up_common+0x3e/0x68<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8008ac95>]
__wake_up_common+0x3e/0x68<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8002e2f3>]
__wake_up+0x38/0x4f<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff800a7b2e>]
kallsyms_lookup+0xe6/0x1ae<br>
Jul 21 14:25:36 ID6316-Client1 last message repeated 3 times<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8006bc3b>]
printk_address+0x9f/0xab<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff80091d2d>]
printk+0x52/0xbd<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff80091d2d>]
printk+0x52/0xbd<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff800a54d2>]
module_text_address+0x33/0x3c<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8009e65b>]
kernel_text_address+0x1a/0x26<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8006b921>]
dump_trace+0x206/0x22f<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88768c8f>]
:osc:osc_enqueue_fini+0x1af/0x240<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8006b97e>]
show_trace+0x34/0x47<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8006ba83>]
_show_stack+0xdb/0xea<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88551b1a>]
:libcfs:lbug_with_loc+0x7a/0xd0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88559e90>]
:libcfs:tracefile_init+0x0/0x110<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8881ac2e>]
:lov:lov_merge_lvb+0x4e/0x220<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff886b86d1>]
:ptlrpc:lustre_swab_buf+0x81/0x170<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff888747cd>]
:lustre:ll_inode_size_lock+0x5d/0x160<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8885d2ca>]
:lustre:ll_extent_lock+0x8ea/0xaf0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff885522b8>]
:libcfs:cfs_alloc+0x68/0xc0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8876cd50>]
:osc:osc_extent_blocking_cb+0x0/0x2b0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88692e50>]
:ptlrpc:ldlm_completion_ast+0x0/0x880<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88868510>]
:lustre:ll_glimpse_callback+0x0/0x440<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8889ef0c>]
:lustre:ll_tree_lock_iov+0x14c/0x310<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8889e987>]
:lustre:ll_node_from_inode+0xc7/0x210<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8885da6a>]
:lustre:ll_file_get_tree_lock_iov+0x59a/0x740<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff886754f4>]
:ptlrpc:ldlm_lock_add_to_lru+0x74/0xe0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8867219a>]
:ptlrpc:lock_res_and_lock+0xba/0xd0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8885e6f4>]
:lustre:ll_file_writev+0xae4/0x1750<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8867736a>]
:ptlrpc:ldlm_lock_decref+0x9a/0xc0<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff887e0d9a>]
:mdc:mdc_set_lock_data+0x1da/0x250<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88692e50>]
:ptlrpc:ldlm_completion_ast+0x0/0x880<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88884ebd>]
:lustre:ras_reset+0x2d/0x110<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8885f360>]
:lustre:ll_file_write+0x0/0x20<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff800df4cd>]
do_readv_writev+0x172/0x291<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8885f360>]
:lustre:ll_file_write+0x0/0x20<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff888698fb>]
:lustre:ll_file_open+0xbab/0xd10<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8008ac95>]
__wake_up_common+0x3e/0x68<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8850b753>]
:nfsd:nfsd_acceptable+0x0/0xd8<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8850c711>]
:nfsd:nfsd_vfs_write+0xf2/0x2e1<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88868d50>]
:lustre:ll_file_open+0x0/0xd10<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8001e8b7>]
__dentry_open+0x101/0x1dc<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8850d027>]
:nfsd:nfsd_write+0xb5/0xd5<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88513ae2>]
:nfsd:nfsd3_proc_write+0xea/0x109<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff885091db>]
:nfsd:nfsd_dispatch+0xd8/0x1d6<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8843d529>]
:sunrpc:svc_process+0x454/0x71b<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff80064644>]
__down_read+0x12/0x92<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff88509746>]
:nfsd:nfsd+0x1a5/0x2cb<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8005dfb1>]
child_rip+0xa/0x11<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 14:25:36 ID6316-Client1 kernel:  [<ffffffff8005dfa7>]
child_rip+0x0/0x11<br>
Jul 21 14:25:36 ID6316-Client1 kernel:<br>
Jul 21 14:25:36 ID6316-Client1 kernel: LustreError: dumping log to
/tmp/lustre-log.1279736736.5464<br>
<br>
<br>
And subsequently:<br>
Jul 21 15:40:10 ID6316-Client1 kernel: BUG: soft lockup - CPU#6 stuck
for 10s! [nfsd:5324]<br>
Jul 21 15:40:10 ID6316-Client1 kernel: CPU 6:<br>
Jul 21 15:40:10 ID6316-Client1 kernel: Modules linked in:
iptable_filter(U) ip_tables(U) mgc(U) lustre(U) lov(U) mdc(U) lquota(U)
osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U)
nfsd(U) exportfs(U) nfs_acl(U) auth_rpcgss(U) autofs4(U) hidp(U)
l2cap(U) bluetooth(U) lockd(U) sunrpc(U) bonding(U) ipt_REJECT(U)
ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U)
x_tables(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) cpufreq_ondemand(U)
acpi_cpufreq(U) freq_table(U) dm_mirror(U) dm_multipath(U) scsi_dh(U)
video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U)
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U)
igb(U) ixgbe(U) 8021q(U) joydev(U) sg(U) i2c_i801(U) serio_raw(U)
i2c_core(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U)
dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ata_piix(U)
libata(U) shpchp(U) 3w_9xxx(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U)
uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)<br>
Jul 21 15:40:10 ID6316-Client1 kernel: Pid: 5324, comm: nfsd Tainted:
G      2.6.18-164.11.1.el5_lustre.1.8.2 #1<br>
Jul 21 15:40:10 ID6316-Client1 kernel: RIP:
0010:[<ffffffff80064bfc>]  [<ffffffff80064bfc>]
.text.lock.spinlock+0x2/0x30<br>
Jul 21 15:40:10 ID6316-Client1 kernel: RSP: 0018:ffff810827163478 
EFLAGS: 00000286<br>
Jul 21 15:40:10 ID6316-Client1 kernel: RAX: ffff81083ceea860 RBX:
ffff8113b7703cc0 RCX: ffff8112e855dd10<br>
Jul 21 15:40:10 ID6316-Client1 kernel: RDX: ffff810826a28040 RSI:
ffff810827163760 RDI: ffff810f59ad4a40<br>
Jul 21 15:40:10 ID6316-Client1 kernel: RBP: ffffffff88677168 R08:
000000000000012d R09: ffff810c8e7e4ae0<br>
Jul 21 15:40:10 ID6316-Client1 kernel: R10: 0000000000000000 R11:
0000000000000000 R12: ffff8112e855dd18<br>
Jul 21 15:40:10 ID6316-Client1 kernel: R13: 0000000000000000 R14:
ffff81131fc73c00 R15: ffffffff8867219a<br>
Jul 21 15:40:10 ID6316-Client1 kernel: FS:  00002b0c882886e0(0000)
GS:ffff81086a2e1140(0000) knlGS:0000000000000000<br>
Jul 21 15:40:10 ID6316-Client1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b<br>
Jul 21 15:40:10 ID6316-Client1 kernel: CR2: 00002aaaae52c000 CR3:
0000000000201000 CR4: 00000000000006e0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:<br>
Jul 21 15:40:10 ID6316-Client1 kernel: Call Trace:<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88805e0d>]
:lov:lov_stripe_lock+0x3d/0x80<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8882260e>]
:lov:lov_update_enqueue_set+0x27e/0x4a0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8860a1b0>]
:obdclass:class_handle2object+0xe0/0x170<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88768c8f>]
:osc:osc_enqueue_fini+0x1af/0x240<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88768dd8>]
:osc:osc_enqueue_interpret+0xb8/0x180<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff886bb240>]
:ptlrpc:lustre_swab_ost_lvb+0x0/0x40<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff886ab181>]
:ptlrpc:ptlrpc_check_set+0x11a1/0x1450<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8004b344>]
try_to_del_timer_sync+0x51/0x5a<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff886ae34a>]
:ptlrpc:ptlrpc_set_wait+0x36a/0x660<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88808442>]
:lov:lov_enqueue+0x612/0x8b0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8008c86b>]
default_wake_function+0x0/0xe<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff886ab9b9>]
:ptlrpc:ptlrpc_prep_set+0x1e9/0x290<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88860eda>]
:lustre:ll_glimpse_size+0x63a/0xc30<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8889b440>]
:lustre:ll_mdc_blocking_ast+0x0/0x520<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8889adc6>]
:lustre:ll_lookup_it+0x776/0x7c0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8889b440>]
:lustre:ll_mdc_blocking_ast+0x0/0x520<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8876cd50>]
:osc:osc_extent_blocking_cb+0x0/0x2b0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88692e50>]
:ptlrpc:ldlm_completion_ast+0x0/0x880<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88868510>]
:lustre:ll_glimpse_callback+0x0/0x440<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8886160a>]
:lustre:ll_inode_revalidate_it+0x13a/0x1d0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff888616c4>]
:lustre:ll_getattr_it+0x24/0x110<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff888617e4>]
:lustre:ll_getattr+0x34/0x40<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885147fb>]
:nfsd:encode_post_op_attr+0x3f/0x213<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8000d3a4>]
dput+0x2c/0x113<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88514d50>]
:nfsd:compose_entry_fh+0x113/0x121<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88514f8d>]
:nfsd:encode_entry+0x22f/0x53c<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff886754f4>]
:ptlrpc:ldlm_lock_add_to_lru+0x74/0xe0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8867219a>]
:ptlrpc:lock_res_and_lock+0xba/0xd0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88676bb8>]
:ptlrpc:ldlm_lock_decref_internal+0x538/0x7f0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8867219a>]
:ptlrpc:lock_res_and_lock+0xba/0xd0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88677168>]
:ptlrpc:__ldlm_handle2lock+0x2f8/0x360<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8867736a>]
:ptlrpc:ldlm_lock_decref+0x9a/0xc0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8885443d>]
:lustre:ll_get_dir_page+0x63d/0x680<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff887e0d9a>]
:mdc:mdc_set_lock_data+0x1da/0x250<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88692e50>]
:ptlrpc:ldlm_completion_ast+0x0/0x880<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885152a5>]
:nfsd:nfs3svc_encode_entry_plus+0xb/0x10<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88855cdb>]
:lustre:ll_readdir+0x87b/0x9e0<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8850b753>]
:nfsd:nfsd_acceptable+0x0/0xd8<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8851529a>]
:nfsd:nfs3svc_encode_entry_plus+0x0/0x10<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8001e8b7>]
__dentry_open+0x101/0x1dc<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8851529a>]
:nfsd:nfs3svc_encode_entry_plus+0x0/0x10<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff80035265>]
vfs_readdir+0x77/0xa9<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8851529a>]
:nfsd:nfs3svc_encode_entry_plus+0x0/0x10<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8850cea0>]
:nfsd:nfsd_readdir+0x6d/0xc5<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88514122>]
:nfsd:nfsd3_proc_readdirplus+0xf8/0x220<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885091db>]
:nfsd:nfsd_dispatch+0xd8/0x1d6<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8843d529>]
:sunrpc:svc_process+0x454/0x71b<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff80064644>]
__down_read+0x12/0x92<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff88509746>]
:nfsd:nfsd+0x1a5/0x2cb<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8005dfb1>]
child_rip+0xa/0x11<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff885095a1>]
:nfsd:nfsd+0x0/0x2cb<br>
Jul 21 15:40:10 ID6316-Client1 kernel:  [<ffffffff8005dfa7>]
child_rip+0x0/0x11<br>
Jul 21 15:40:10 ID6316-Client1 kernel:<br>
<br>
</span></span>
</body>
</html>