[Lustre-discuss] Call trace generated on clients when rebooting the MDS.

Nirmal Seenu nirmal at fnal.gov
Thu May 21 11:45:01 PDT 2009


I notice the following call trace on some(10-20%) of the lustre clients 
whenever I reboot the MDS. The lustre clients eventually recover and 
everything seems to be working fine at that point.

Does anyone else notice these errors? Is it safe to ignore these errors?

I am running the 1.6.7.1 lustre patched RHEL5 kernel and the clients run 
the 1.6.7.1 patchless clients on RHEL kernel: 2.6.18-128.1.10.

BUG: soft lockup - CPU#1 stuck for 10s! [ptlrpcd-recov:10639]
CPU 1:
Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) 
ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) libafs(PU) 
autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc 
ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink ipt_REJECT 
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter 
ip6_tables x_tables ipv6 xfrm_nalgo crypto_api vfat fat dm_mirror dm_log 
dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec button 
battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod joydev 
sg usb_storage ide_cd e1000e serio_raw i2c_piix4 cdrom pcspkr i2c_core 
shpchp bnx2 sata_svw libata megaraid_sas sd_mod scsi_mod ext3 jbd 
uhci_hcd ohci_hcd ehci_hcd
Pid: 10639, comm: ptlrpcd-recov Tainted: P      2.6.18-128.1.10.el5 #1
RIP: 0010:[<ffffffff8865dcac>]  [<ffffffff8865dcac>] 
:lnet:lnet_lookup_cookie+0x3c/0x50
RSP: 0018:ffff81022048bc58  EFLAGS: 00000206
RAX: ffff8103aec504d0 RBX: ffff81029678f000 RCX: ffff8103cb3215d0
RDX: ffff81021c1df250 RSI: 0000000000000001 RDI: 00000000052d3325
RBP: 0000000000000282 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000100000000 R11: 0000000000000000 R12: ffff81034e9de640
R13: ffffc2001049d4d0 R14: 0000000c00000000 R15: ffffffff8002df8f
FS:  00002b4d02c93240(0000) GS:ffff81024711ca40(0000) knlGS:00000000f7f346c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b033ff30000 CR3: 00000001d3b13000 CR4: 00000000000006e0

Call Trace:
  [<ffffffff886647cb>] :lnet:LNetMDUnlink+0x7b/0xf0
  [<ffffffff8878c64c>] :ptlrpc:at_add+0x4c/0x1b0
  [<ffffffff8876d9d0>] :ptlrpc:lustre_msg_get_slv+0x30/0xf0
  [<ffffffff8875c611>] :ptlrpc:ptlrpc_at_adj_net_latency+0xe1/0x200
  [<ffffffff88744ae0>] :ptlrpc:ldlm_cli_update_pool+0x1f0/0x2a0
  [<ffffffff8875e76c>] :ptlrpc:ptlrpc_unregister_reply+0x23c/0x9c0
  [<ffffffff8875e3df>] :ptlrpc:after_reply+0x7df/0x8d0
  [<ffffffff8002df8f>] __wake_up+0x38/0x4f
  [<ffffffff88761c75>] :ptlrpc:ptlrpc_check_set+0x15b5/0x18d0
  [<ffffffff8879375d>] :ptlrpc:ptlrpcd_check+0xdd/0x1f0
  [<ffffffff80094ffc>] process_timeout+0x0/0x5
  [<ffffffff8003b730>] remove_wait_queue+0x1c/0x2c
  [<ffffffff88793ca8>] :ptlrpc:ptlrpcd+0xb8/0x259
  [<ffffffff8008a4b3>] default_wake_function+0x0/0xe
  [<ffffffff800b4a92>] audit_syscall_exit+0x31b/0x336
  [<ffffffff8005dfb1>] child_rip+0xa/0x11
  [<ffffffff88793bf0>] :ptlrpc:ptlrpcd+0x0/0x259
  [<ffffffff8005dfa7>] child_rip+0x0/0x11

TIA
Nirmal



More information about the lustre-discuss mailing list