[Lustre-discuss] SCSI driver problem on 2.6.22.19 + lustre 1.8.1?
Nirmal Seenu
nirmal at fnal.gov
Mon Aug 31 12:19:19 PDT 2009
I have been having a lot of troubles with my SCSI devices on 2.6.22.19 +
lustre 1.8.1. I am not able to boot my servers if the fibre is connected
and I have to physically remove the fibre every time that I need to
reboot the server. I am using SATABeast for OSTs and it is connected to
the servers through fibre channel using QLE2460 cards.
I was running 1.8.0.1 without any problem on the same kernel(2.6.22.19)
and all these problems started when I upgraded to 1.8.1 using the same
kernel.
The following Error messages and Call traces are generated when the
machine is booted with the fibre connected and the machine pretty much
becomes unresponsive at this point:
TIA
Nirmal
qla2xxx 0000:03:00.0: LIP reset occured (f7ef).
qla2xxx 0000:03:00.0: LIP occured (f7ef).
qla2xxx 0000:03:00.0: LOOP UP detected (4 Gbps).
scsi 4:0:0:0: Direct-Access NEXSAN SATABeast Gl66 PQ: 1 ANSI: 5
scsi 4:0:0:101: Direct-Access NEXSAN SATABeast Gl66 PQ: 0
ANSI: 5
kobject_add failed for 4:0:0:101 with -EEXIST, don't try to register
things with the same name in the same directory.
Call Trace:
[<ffffffff80311c3c>] kobject_put+0x19/0x1b
[<ffffffff80311fb8>] kobject_shadow_add+0x177/0x1ac
[<ffffffff80311ff8>] kobject_add+0xb/0xd
[<ffffffff80388c22>] device_add+0xb5/0x5ea
[<ffffffff80250102>] trace_hardirqs_on+0x11c/0x147
[<ffffffff880680ca>] :scsi_mod:scsi_sysfs_add_sdev+0x39/0x22d
[<ffffffff88065e97>] :scsi_mod:scsi_probe_and_add_lun+0x9de/0xb0e
[<ffffffff88066917>] :scsi_mod:__scsi_scan_target+0x44a/0x638
[<ffffffff80468a19>] __mutex_lock_slowpath+0x26c/0x279
[<ffffffff88067101>] :scsi_mod:scsi_scan_target+0x9a/0xaf
[<ffffffff880bf86e>] :scsi_transport_fc:fc_scsi_scan_rport+0x0/0x89
[<ffffffff880bf8d1>] :scsi_transport_fc:fc_scsi_scan_rport+0x63/0x89
[<ffffffff80244a92>] run_workqueue+0x97/0x16a
[<ffffffff80245465>] worker_thread+0x0/0xea
[<ffffffff80245544>] worker_thread+0xdf/0xea
[<ffffffff80248603>] autoremove_wake_function+0x0/0x38
[<ffffffff802484bb>] kthread+0x49/0x76
[<ffffffff8020abb8>] child_rip+0xa/0x12
[<ffffffff8020a2cc>] restore_args+0x0/0x30
[<ffffffff80293fe5>] ____cache_alloc_node+0xff/0x144
[<ffffffff80248472>] kthread+0x0/0x76
[<ffffffff8020abae>] child_rip+0x0/0x12
error 1
scsi 4:0:0:0: Unexpected response from lun 101 while scanning, scan aborted
list_add corruption. prev->next should be next (ffffffff805bc9e8), but
was ffff81041f5d2410. (prev=ffff81041f5d2410).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [1] SMP
CPU 4
Modules linked in: usb_storage qla2xxx scsi_transport_fc sata_nv libata
sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 632, comm: scsi_scan_5 Not tainted 2.6.22.19_lustre.1.8.1 #1
RIP: 0010:[<ffffffff80318cca>] [<ffffffff80318cca>] __list_add+0x47/0x5b
RSP: 0000:ffff81021f2afc40 EFLAGS: 00010286
RAX: 0000000000000079 RBX: ffff81041f5c5930 RCX: 0000000000000079
RDX: ffff81021a1b8080 RSI: 0000000000000001 RDI: 0000000000000002
RBP: ffff81021f2afc40 R08: 0000000000000002 R09: ffffffff8023516f
R10: ffff81021f2afa30 R11: 0000000000000000 R12: 0000000000000000
R13: ffff81041f5c57a0 R14: ffff81041f5c57a0 R15: 00000000fffffffeqla2xxx
0000:03:00.0: LIP reset occured (f7ef).
qla2xxx 0000:03:00.0: LIP occured (f7ef).
qla2xxx 0000:03:00.0: LOOP UP detected (4 Gbps).
scsi 4:0:0:0: Direct-Access NEXSAN SATABeast Gl66 PQ: 1 ANSI: 5
scsi 4:0:0:101: Direct-Access NEXSAN SATABeast Gl66 PQ: 0
ANSI: 5
kobject_add failed for 4:0:0:101 with -EEXIST, don't try to register
things with the same name in the same directory.
Call Trace:
[<ffffffff80311c3c>] kobject_put+0x19/0x1b
[<ffffffff80311fb8>] kobject_shadow_add+0x177/0x1ac
[<ffffffff80311ff8>] kobject_add+0xb/0xd
[<ffffffff80388c22>] device_add+0xb5/0x5ea
[<ffffffff80250102>] trace_hardirqs_on+0x11c/0x147
[<ffffffff880680ca>] :scsi_mod:scsi_sysfs_add_sdev+0x39/0x22d
[<ffffffff88065e97>] :scsi_mod:scsi_probe_and_add_lun+0x9de/0xb0e
[<ffffffff88066917>] :scsi_mod:__scsi_scan_target+0x44a/0x638
[<ffffffff80468a19>] __mutex_lock_slowpath+0x26c/0x279
[<ffffffff88067101>] :scsi_mod:scsi_scan_target+0x9a/0xaf
[<ffffffff880bf86e>] :scsi_transport_fc:fc_scsi_scan_rport+0x0/0x89
[<ffffffff880bf8d1>] :scsi_transport_fc:fc_scsi_scan_rport+0x63/0x89
[<ffffffff80244a92>] run_workqueue+0x97/0x16a
[<ffffffff80245465>] worker_thread+0x0/0xea
[<ffffffff80245544>] worker_thread+0xdf/0xea
[<ffffffff80248603>] autoremove_wake_function+0x0/0x38
[<ffffffff802484bb>] kthread+0x49/0x76
[<ffffffff8020abb8>] child_rip+0xa/0x12
[<ffffffff8020a2cc>] restore_args+0x0/0x30
[<ffffffff80293fe5>] ____cache_alloc_node+0xff/0x144
[<ffffffff80248472>] kthread+0x0/0x76
[<ffffffff8020abae>] child_rip+0x0/0x12
error 1
scsi 4:0:0:0: Unexpected response from lun 101 while scanning, scan aborted
list_add corruption. prev->next should be next (ffffffff805bc9e8), but
was ffff81041f5d2410. (prev=ffff81041f5d2410).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [1] SMP
CPU 4
Modules linked in: usb_storage qla2xxx scsi_transport_fc sata_nv libata
sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 632, comm: scsi_scan_5 Not tainted 2.6.22.19_lustre.1.8.1 #1
RIP: 0010:[<ffffffff80318cca>] [<ffffffff80318cca>] __list_add+0x47/0x5b
RSP: 0000:ffff81021f2afc40 EFLAGS: 00010286
RAX: 0000000000000079 RBX: ffff81041f5c5930 RCX: 0000000000000079
RDX: ffff81021a1b8080 RSI: 0000000000000001 RDI: 0000000000000002
RBP: ffff81021f2afc40 R08: 0000000000000002 R09: ffffffff8023516f
R10: ffff81021f2afa30 R11: 0000000000000000 R12: 0000000000000000
R13: ffff81041f5c57a0 R14: ffff81041f5c57a0 R15: 00000000fffffffe
FS: 00000000008578f0(0000) GS:ffff81041f4e90a0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000056d080 CR3: 000000021a58a000 CR4: 00000000000006e0
Process scsi_scan_5 (pid: 632, threadinfo ffff81021f2ae000, task
ffff81021a1b8080)
Stack: ffff81021f2afc80 ffffffff80311f06 ffff81041a8443e0 ffff81041a844008
ffff81041f5c5780 ffff81041f5c57a0 ffff81041f5c57a0 0000000000000000
ffff81021f2afc90 ffffffff80311ff8 ffff81021f2afd10 ffffffff80388c22
Call Trace:
[<ffffffff80311f06>] kobject_shadow_add+0xc5/0x1ac
[<ffffffff80311ff8>] kobject_add+0xb/0xd
[<ffffffff80388c22>] device_add+0xb5/0x5ea
[<ffffffff8038e480>] transport_setup_classdev+0x0/0x1a
[<ffffffff880663bf>] :scsi_mod:scsi_alloc_target+0x2e4/0x357
[<ffffffff8806653a>] :scsi_mod:__scsi_scan_target+0x6d/0x638
[<ffffffff8024ff18>] mark_held_locks+0x4a/0x6a
[<ffffffff80468a0d>] __mutex_lock_slowpath+0x260/0x279
[<ffffffff8024dba2>] debug_mutex_free_waiter+0x5b/0x5f
[<ffffffff88066b55>] :scsi_mod:scsi_scan_channel+0x50/0x79
[<ffffffff88066c5d>] :scsi_mod:scsi_scan_host_selected+0xdf/0x11f
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff88066d08>] :scsi_mod:do_scsi_scan_host+0x6b/0x70
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff88066e87>] :scsi_mod:do_scan_async+0x17/0x114
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff802484bb>] kthread+0x49/0x76
[<ffffffff8020abb8>] child_rip+0xa/0x12
[<ffffffff8020a2cc>] restore_args+0x0/0x30
[<ffffffff80248472>] kthread+0x0/0x76
[<ffffffff8020abae>] child_rip+0x0/0x12
Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a c9 c3
RIP [<ffffffff80318cca>] __list_add+0x47/0x5b
RSP <ffff81021f2afc40>
FS: 00000000008578f0(0000) GS:ffff81041f4e90a0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000056d080 CR3: 000000021a58a000 CR4: 00000000000006e0
Process scsi_scan_5 (pid: 632, threadinfo ffff81021f2ae000, task
ffff81021a1b8080)
Stack: ffff81021f2afc80 ffffffff80311f06 ffff81041a8443e0 ffff81041a844008
ffff81041f5c5780 ffff81041f5c57a0 ffff81041f5c57a0 0000000000000000
ffff81021f2afc90 ffffffff80311ff8 ffff81021f2afd10 ffffffff80388c22
Call Trace:
[<ffffffff80311f06>] kobject_shadow_add+0xc5/0x1ac
[<ffffffff80311ff8>] kobject_add+0xb/0xd
[<ffffffff80388c22>] device_add+0xb5/0x5ea
[<ffffffff8038e480>] transport_setup_classdev+0x0/0x1a
[<ffffffff880663bf>] :scsi_mod:scsi_alloc_target+0x2e4/0x357
[<ffffffff8806653a>] :scsi_mod:__scsi_scan_target+0x6d/0x638
[<ffffffff8024ff18>] mark_held_locks+0x4a/0x6a
[<ffffffff80468a0d>] __mutex_lock_slowpath+0x260/0x279
[<ffffffff8024dba2>] debug_mutex_free_waiter+0x5b/0x5f
[<ffffffff88066b55>] :scsi_mod:scsi_scan_channel+0x50/0x79
[<ffffffff88066c5d>] :scsi_mod:scsi_scan_host_selected+0xdf/0x11f
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff88066d08>] :scsi_mod:do_scsi_scan_host+0x6b/0x70
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff88066e87>] :scsi_mod:do_scan_async+0x17/0x114
[<ffffffff88066e70>] :scsi_mod:do_scan_async+0x0/0x114
[<ffffffff802484bb>] kthread+0x49/0x76
[<ffffffff8020abb8>] child_rip+0xa/0x12
[<ffffffff8020a2cc>] restore_args+0x0/0x30
[<ffffffff80248472>] kthread+0x0/0x76
[<ffffffff8020abae>] child_rip+0x0/0x12
Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a c9 c3
RIP [<ffffffff80318cca>] __list_add+0x47/0x5b
RSP <ffff81021f2afc40>
More information about the lustre-discuss
mailing list