[Lustre-discuss] Lustre 1.6.5.1 client with kernel 2.6.22.14
David Levi Hevroni
dlhevroni at gmail.com
Mon Aug 4 10:59:43 PDT 2008
I see few weeks ago some discuses about using kernel 2.6.22.14. We
tried using this kernel with lustre 1.6.5.1 and OFED-1.3. When we
mount lustre vi tcp it's look fine, but when we mount via IB we get
"general protection fault", some details:
CentOS-5.2 with patchless vanilla kernel 2.6.22.14 (we also try the
same kernel with luster 1.6.5.1 patch but it not so different )
install OFED-1.3 after reboot it look O.K we test ib_send_bw / lat and
it look fine.
Next install lustre 1.6.5.1, "./configure
--with-linux=/usr/src/linux-2.6.22.14 " and reboot the system.
We modify /etc/modprobe.conf:
#lustre setting
options lnet networks=tcp
then: mount -t lustre 192.168.1.20 at tcp:/spfs /mnt/lustrefs and it look fine.
When we add IB by modify /etc/modprobe.conf: options lnet
networks=o2ib,tcp reboot the system we had the following error:
$modprobe lnet O.K
lctl
lctl > network up
Message from syslogd@ at Tue Aug 5 00:35:30 2008 ...
grid06 kernel: general protection fault: 0000 [1] SMP Segmentation fault
and when we look at /var/log/messages:
Aug 5 00:35:30 grid06 kernel: general protection fault: 0000 [1] SMP
Aug 5 00:35:30 grid06 kernel: CPU 1
Aug 5 00:35:30 grid06 kernel: Modules linked in: ko2iblnd rdma_cm
iw_cm ib_addr lnet libcfs ib_uverbs ib_umad cxgb3 ib_ipath mlx4_ib
mlx4_core ib_ipoib ib_cm ib_sa ib_mthca ib_mad ib_core
Aug 5 00:35:30 grid06 kernel: Pid: 7723, comm: lctl Not tainted 2.6.22.14 #1
Aug 5 00:35:30 grid06 kernel: RIP: 0010:[<ffffffff88161eea>]
[<ffffffff88161eea>] :ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160
Aug 5 00:35:30 grid06 kernel: RSP: 0000:ffff81010514f818 EFLAGS: 00010286
Aug 5 00:35:30 grid06 kernel: RAX: ffffffff8802a695 RBX:
ffffc200014ae000 RCX: 0000000000000001
Aug 5 00:35:30 grid06 kernel: RDX: 0000000000001000 RSI:
ffff8100f1d3a000 RDI: ffff81007d632000
Aug 5 00:35:30 grid06 kernel: RBP: ffff8101056b37c0 R08:
0000000000000000 R09: ffff81010514f798
Aug 5 00:35:30 grid06 kernel: R10: ffff81010514f7df R11:
0000000000003a98 R12: 0000000000000001
Aug 5 00:35:30 grid06 kernel: R13: 0000000000000001 R14:
0000000000000000 R15: ffff8101056b37e8
Aug 5 00:35:30 grid06 kernel: FS: 00002b3ae29f86e0(0000)
GS:ffff810105d4b740(0000) knlGS:00000000f7d576c0
Aug 5 00:35:30 grid06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 5 00:35:30 grid06 kernel: CR2: 00002ab6525776d0 CR3:
00000000f2e93000 CR4: 00000000000006e0
Aug 5 00:35:30 grid06 kernel: Process lctl (pid: 7723, threadinfo
ffff81010514e000, task ffff8100f5db2040)
Aug 5 00:35:30 grid06 kernel: Stack: ffff8100f178e7c0
ffff810105d69600 ffffffff8817b788 ffff8101056b37c0
Aug 5 00:35:30 grid06 kernel: ffff8101056b3e40 ffffffff88175558
ffff81010512f980 ffffffff88166eb3
Aug 5 00:35:30 grid06 kernel: ffffffff8810641f 0000000000000002
0000000000000000 0000000000000001
Aug 5 00:35:30 grid06 kernel: Call Trace:
Aug 5 00:35:30 grid06 kernel: [<ffffffff88166eb3>]
:ko2iblnd:kiblnd_startup+0x2d3/0xa20
Aug 5 00:35:30 grid06 kernel: [<ffffffff8811a9f9>]
:lnet:lnet_startup_lndnis+0xc9/0x6a0
Aug 5 00:35:30 grid06 kernel: [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60
Aug 5 00:35:30 grid06 kernel: [<ffffffff8811b785>]
:lnet:LNetNIInit+0x145/0x210
Aug 5 00:35:30 grid06 kernel: [<ffffffff8064a9c4>] __down_read+0x12/0x9a
Aug 5 00:35:30 grid06 kernel: [<ffffffff8812a15a>]
:lnet:lnet_configure+0x4a/0x60
Aug 5 00:35:30 grid06 kernel: [<ffffffff8810145a>]
:libcfs:libcfs_ioctl+0xba/0x5b0
Aug 5 00:35:30 grid06 kernel: [<ffffffff8063a056>] xs_send_kvec+0x80/0x89
Aug 5 00:35:30 grid06 kernel: [<ffffffff80638e55>] xprt_timer+0x0/0x7f
Aug 5 00:35:30 grid06 kernel: [<ffffffff8063cbd0>]
rpc_wake_up_next+0x15c/0x163
Aug 5 00:35:30 grid06 kernel: [<ffffffff80638779>]
__xprt_lock_write_next_cong+0x48/0x90
Aug 5 00:35:30 grid06 kernel: [<ffffffff802299b0>]
find_busiest_group+0x252/0x684
Aug 5 00:35:30 grid06 kernel: [<ffffffff8064b08a>]
__reacquire_kernel_lock+0x26/0x44
Aug 5 00:35:30 grid06 kernel: [<ffffffff80649456>] thread_return+0xac/0xe4
Aug 5 00:35:30 grid06 kernel: [<ffffffff8028a129>] __d_lookup+0xb0/0x100
Aug 5 00:35:30 grid06 kernel: [<ffffffff80281464>] do_lookup+0x63/0x1ae
Aug 5 00:35:30 grid06 kernel: [<ffffffff8028a5e5>] dput+0x26/0x115
Aug 5 00:35:30 grid06 kernel: [<ffffffff802836cf>]
__link_path_walk+0xb9b/0xcf0
Aug 5 00:35:30 grid06 kernel: [<ffffffff80390975>]
n_tty_chars_in_buffer+0x68/0x70
Aug 5 00:35:30 grid06 kernel: [<ffffffff80242810>] remove_wait_queue+0x12/0x45
Aug 5 00:35:30 grid06 kernel: [<ffffffff8028e42a>] mntput_no_expire+0x1c/0x79
Aug 5 00:35:30 grid06 kernel: [<ffffffff802838f2>] link_path_walk+0xce/0xe0
Aug 5 00:35:30 grid06 kernel: [<ffffffff8023aefb>]
recalc_sigpending_and_wake+0x9/0x1a
Aug 5 00:35:30 grid06 kernel: [<ffffffff80351863>]
__strncpy_from_user+0x17/0x41
Aug 5 00:35:30 grid06 kernel: [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60
Aug 5 00:35:30 grid06 kernel: [<ffffffff88100acd>]
:libcfs:libcfs_psdev_open+0x6d/0x2c0
Aug 5 00:35:30 grid06 kernel: [<ffffffff8027cf71>] exact_lock+0xc/0x14
Aug 5 00:35:30 grid06 kernel: [<ffffffff80649f1c>] mutex_lock+0xd/0x1e
Aug 5 00:35:30 grid06 kernel: [<ffffffff80393915>] misc_open+0x1b5/0x1c0
Aug 5 00:35:30 grid06 kernel: [<ffffffff8027d435>] chrdev_open+0x167/0x196
Aug 5 00:35:30 grid06 kernel: [<ffffffff880fec0f>]
:libcfs:libcfs_ioctl+0xaf/0x160
Aug 5 00:35:30 grid06 kernel: [<ffffffff8022b7f9>]
default_wake_function+0x0/0xe
Aug 5 00:35:30 grid06 kernel: [<ffffffff8064b032>] lock_kernel+0x1b/0x37
Aug 5 00:35:30 grid06 kernel: [<ffffffff880feb60>]
:libcfs:libcfs_ioctl+0x0/0x160
Aug 5 00:35:30 grid06 kernel: [<ffffffff802858cd>] do_ioctl+0x9d/0xb6
Aug 5 00:35:30 grid06 kernel: [<ffffffff80285b29>] vfs_ioctl+0x243/0x25c
Aug 5 00:35:30 grid06 kernel: [<ffffffff80285b7e>] sys_ioctl+0x3c/0x5e
Aug 5 00:35:30 grid06 kernel: [<ffffffff8020935e>] system_call+0x7e/0x83
Aug 5 00:35:30 grid06 kernel:
Aug 5 00:35:30 grid06 kernel:
Aug 5 00:35:30 grid06 kernel: Code: ff 50 08 48 89 43 50 48 8b 45 28
48 89 58 08 48 89 03 4c 89
Aug 5 00:35:30 grid06 kernel: RIP [<ffffffff88161eea>]
:ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160
Aug 5 00:35:30 grid06 kernel: RSP <ffff81010514f818>
Is there is something wrong in the our configuration ?
Thanks
David Levi-Hevroni
Papp Tamas tompos at martos.bme.hu
Tue Jun 17 12:36:29 PDT 2008
* Previous message: [Lustre-discuss] 2.6.22
* Next message: [Lustre-discuss] MGS disk size and activity
* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bernd Schubert wrote:
>
> Yeah, this is what I immediately thought when I saw your trace. The kernel
> developer somehow manage to change the interface to the cache functions
> on each kernel version (though not during the last digit subversions)
> The trace lets me thing these functions have been called with the wrong
> arguments. However, lustre already has wrapper functions for this and
> I guess the configure script did something wrong this time.
> Unless the lustre developers step in, I will try to find some time
> tomorrow or on Thursday to check what's wrong.
Well, thank you very much.
Have somebody else tried 2.6.22 and lustre?
Bye,
tamas
More information about the lustre-discuss
mailing list