[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Adrian Ulrich adrian at blinkenlights.ch
Fri Aug 13 11:29:20 PDT 2010


Hi Alexey,

> Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file > $output

Oh, sorry for mixing this up.

'lctl df' doesn't show much new stuff:

00010000:00000400:0:1281721514.362102:0:13008:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005:
 9f880a5e-2331-07a8-8611-d6e3102f466e reconnecting
00010000:00000400:0:1281721514.362107:0:13008:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005:
refuse reconnection from 9f880a5e-2331-07a8-8611-d6e3102f466e at 10.201.48.12@o2ib to 0xffff8101c93b6000; still busy with 1 active RPCs
00010000:00020000:
00010000:00000400:7:1281721525.880767:0:11822:0:(ldlm_lib.c:541:target_handle_reconnect()) lustre1-OST0005:
lustre1-mdtlov_UUID reconnecting


> please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects.

LustreError: 12972:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 71 previous similar messages
BUG: soft lockup - CPU#4 stuck for 59s! [ll_ost_creat_00:11833]
CPU 4:
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U)
 ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) raid1(U) netconsole(U) lockd(U) sunrpc(U) rdma_ucm(U) qlgc_vnic(U) ib_sdp(U) rdma_cm
(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) 
cxgb3(U) ib_ipath(U) ib_mthca(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) 
asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U) sr_mod(U) cdrom(U) sg(U) tpm_infineon(U) 
tpm(U) tpm_bios(U) mlx4_core(U) i5000_edac(U) edac_mc(U) i2c_i801(U) pcspkr(U) e1000e(U) i2c_core(U) serio_raw(U) dm_raid45(U) dm_message(U) 
dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) ahci(U) ata_piix(U) libata(U) mptsas(U) scsi_transport_sas(U) mptfc(U) 
scsi_transport_fc(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) shpchp(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) 
ohci_hcd(U) ehci_hcd(U)
Pid: 11833, comm: ll_ost_creat_00 Tainted: G      2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
RIP: 0010:[<ffffffff88b49af4>]  [<ffffffff88b49af4>] :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
RSP: 0018:ffff8101e715d500  EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000000000c91a9f1
RDX: ffff8101e8893800 RSI: ffff8101e715d4e8 RDI: ffff81010773e838
RBP: 0000000000000002 R08: ffff81017bd9cff8 R09: ffff81017bd9c000
R10: ffff810216dfb000 R11: 000000004c6578dc R12: ffff81017d41e6d0
R13: ffffffff80063b4c R14: ffff8101e715d5b8 R15: ffffffff80014fae
FS:  00002ab1d8b97220(0000) GS:ffff81021fc74bc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000009c9c178 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
 [<ffffffff8005ba4d>] cache_alloc_refill+0x106/0x186
 [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290
 [<ffffffff800366e8>] __lookup_hash+0x10b/0x130
 [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61
 [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730
 [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350
 [<ffffffff8026f08b>] __down_trylock+0x44/0x4e
 [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220
 [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0
 [<ffffffff887e0923>] :lnet:lnet_ni_send+0x93/0xd0
 [<ffffffff8000d0bd>] dput+0x23/0x10a
 [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0
 [<ffffffff887e6ad2>] :lnet:LNetPut+0x702/0x800
 [<ffffffff888e9a13>] :ptlrpc:ptl_send_buf+0x3f3/0x5b0
 [<ffffffff888ef394>] :ptlrpc:lustre_msg_add_version+0x34/0x110
 [<ffffffff888ea198>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0
 [<ffffffff888f1f69>] :ptlrpc:lustre_pack_reply+0x29/0xb0
 [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70
 [<ffffffff8001a1b8>] vsnprintf+0x559/0x59e
 [<ffffffff88797978>] :libcfs:libcfs_debug_vmsg2+0x6e8/0x990
 [<ffffffff80148e8c>] __next_cpu+0x19/0x28
 [<ffffffff80088f36>] find_busiest_group+0x20d/0x621
 [<ffffffff888f3f05>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
 [<ffffffff80089d8d>] enqueue_task+0x41/0x56
 [<ffffffff888f8c1d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
 [<ffffffff888fb357>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1170
 [<ffffffff8003d382>] lock_timer_base+0x1b/0x3c
 [<ffffffff8008881d>] __wake_up_common+0x3e/0x68
 [<ffffffff888fee08>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
 [<ffffffff8008a3f3>] default_wake_function+0x0/0xe
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff888fdbf0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Btw: We are running 1.8.1.1 (with rhel kernel 2.6.18-128.7.1.el5_lustre.1.8.1.1)

Regards,
 Adrian



More information about the lustre-discuss mailing list