[Lustre-discuss] Lustre 1.6.5.1 client with kernel 2.6.22.14

David Levi Hevroni dlhevroni at gmail.com
Mon Aug 4 10:59:43 PDT 2008


I see few weeks ago some discuses about using kernel 2.6.22.14. We
tried using this kernel with lustre 1.6.5.1 and OFED-1.3. When we
mount lustre vi tcp it's look fine, but when we mount via IB we get
"general protection fault",  some details:

CentOS-5.2  with patchless  vanilla kernel 2.6.22.14 (we also try the
same kernel with luster 1.6.5.1 patch but it  not so different )
install OFED-1.3 after reboot it look O.K we test ib_send_bw / lat and
it look fine.
Next install lustre 1.6.5.1, "./configure
--with-linux=/usr/src/linux-2.6.22.14 " and reboot the system.
We modify /etc/modprobe.conf:
#lustre setting
options lnet networks=tcp
then: mount -t lustre 192.168.1.20 at tcp:/spfs /mnt/lustrefs and it look fine.

When we add IB by  modify /etc/modprobe.conf:  options lnet
networks=o2ib,tcp  reboot the system we had the following error:
$modprobe lnet  O.K
 lctl
lctl > network up
Message from syslogd@ at Tue Aug  5 00:35:30 2008 ...
grid06 kernel: general protection fault: 0000 [1] SMP Segmentation fault

and when we look at /var/log/messages:
Aug  5 00:35:30 grid06 kernel: general protection fault: 0000 [1] SMP
Aug  5 00:35:30 grid06 kernel: CPU 1
Aug  5 00:35:30 grid06 kernel: Modules linked in: ko2iblnd rdma_cm
iw_cm ib_addr lnet libcfs ib_uverbs ib_umad cxgb3 ib_ipath mlx4_ib
mlx4_core ib_ipoib ib_cm ib_sa ib_mthca ib_mad ib_core
Aug  5 00:35:30 grid06 kernel: Pid: 7723, comm: lctl Not tainted 2.6.22.14 #1
Aug  5 00:35:30 grid06 kernel: RIP: 0010:[<ffffffff88161eea>]
[<ffffffff88161eea>] :ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160
Aug  5 00:35:30 grid06 kernel: RSP: 0000:ffff81010514f818  EFLAGS: 00010286
Aug  5 00:35:30 grid06 kernel: RAX: ffffffff8802a695 RBX:
ffffc200014ae000 RCX: 0000000000000001
Aug  5 00:35:30 grid06 kernel: RDX: 0000000000001000 RSI:
ffff8100f1d3a000 RDI: ffff81007d632000
Aug  5 00:35:30 grid06 kernel: RBP: ffff8101056b37c0 R08:
0000000000000000 R09: ffff81010514f798
Aug  5 00:35:30 grid06 kernel: R10: ffff81010514f7df R11:
0000000000003a98 R12: 0000000000000001
Aug  5 00:35:30 grid06 kernel: R13: 0000000000000001 R14:
0000000000000000 R15: ffff8101056b37e8
Aug  5 00:35:30 grid06 kernel: FS:  00002b3ae29f86e0(0000)
GS:ffff810105d4b740(0000) knlGS:00000000f7d576c0
Aug  5 00:35:30 grid06 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  5 00:35:30 grid06 kernel: CR2: 00002ab6525776d0 CR3:
00000000f2e93000 CR4: 00000000000006e0
Aug  5 00:35:30 grid06 kernel: Process lctl (pid: 7723, threadinfo
ffff81010514e000, task ffff8100f5db2040)
Aug  5 00:35:30 grid06 kernel: Stack:  ffff8100f178e7c0
ffff810105d69600 ffffffff8817b788 ffff8101056b37c0
Aug  5 00:35:30 grid06 kernel:  ffff8101056b3e40 ffffffff88175558
ffff81010512f980 ffffffff88166eb3
Aug  5 00:35:30 grid06 kernel:  ffffffff8810641f 0000000000000002
0000000000000000 0000000000000001
Aug  5 00:35:30 grid06 kernel: Call Trace:
Aug  5 00:35:30 grid06 kernel:  [<ffffffff88166eb3>]
:ko2iblnd:kiblnd_startup+0x2d3/0xa20
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8811a9f9>]
:lnet:lnet_startup_lndnis+0xc9/0x6a0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8811b785>]
:lnet:LNetNIInit+0x145/0x210
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8064a9c4>] __down_read+0x12/0x9a
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8812a15a>]
:lnet:lnet_configure+0x4a/0x60
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8810145a>]
:libcfs:libcfs_ioctl+0xba/0x5b0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8063a056>] xs_send_kvec+0x80/0x89
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80638e55>] xprt_timer+0x0/0x7f
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8063cbd0>]
rpc_wake_up_next+0x15c/0x163
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80638779>]
__xprt_lock_write_next_cong+0x48/0x90
Aug  5 00:35:30 grid06 kernel:  [<ffffffff802299b0>]
find_busiest_group+0x252/0x684
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8064b08a>]
__reacquire_kernel_lock+0x26/0x44
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80649456>] thread_return+0xac/0xe4
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8028a129>] __d_lookup+0xb0/0x100
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80281464>] do_lookup+0x63/0x1ae
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8028a5e5>] dput+0x26/0x115
Aug  5 00:35:30 grid06 kernel:  [<ffffffff802836cf>]
__link_path_walk+0xb9b/0xcf0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80390975>]
n_tty_chars_in_buffer+0x68/0x70
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80242810>] remove_wait_queue+0x12/0x45
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8028e42a>] mntput_no_expire+0x1c/0x79
Aug  5 00:35:30 grid06 kernel:  [<ffffffff802838f2>] link_path_walk+0xce/0xe0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8023aefb>]
recalc_sigpending_and_wake+0x9/0x1a
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80351863>]
__strncpy_from_user+0x17/0x41
Aug  5 00:35:30 grid06 kernel:  [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60
Aug  5 00:35:30 grid06 kernel:  [<ffffffff88100acd>]
:libcfs:libcfs_psdev_open+0x6d/0x2c0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8027cf71>] exact_lock+0xc/0x14
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80649f1c>] mutex_lock+0xd/0x1e
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80393915>] misc_open+0x1b5/0x1c0
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8027d435>] chrdev_open+0x167/0x196
Aug  5 00:35:30 grid06 kernel:  [<ffffffff880fec0f>]
:libcfs:libcfs_ioctl+0xaf/0x160
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8022b7f9>]
default_wake_function+0x0/0xe
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8064b032>] lock_kernel+0x1b/0x37
Aug  5 00:35:30 grid06 kernel:  [<ffffffff880feb60>]
:libcfs:libcfs_ioctl+0x0/0x160
Aug  5 00:35:30 grid06 kernel:  [<ffffffff802858cd>] do_ioctl+0x9d/0xb6
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80285b29>] vfs_ioctl+0x243/0x25c
Aug  5 00:35:30 grid06 kernel:  [<ffffffff80285b7e>] sys_ioctl+0x3c/0x5e
Aug  5 00:35:30 grid06 kernel:  [<ffffffff8020935e>] system_call+0x7e/0x83
Aug  5 00:35:30 grid06 kernel:
Aug  5 00:35:30 grid06 kernel:
Aug  5 00:35:30 grid06 kernel: Code: ff 50 08 48 89 43 50 48 8b 45 28
48 89 58 08 48 89 03 4c 89
Aug  5 00:35:30 grid06 kernel: RIP  [<ffffffff88161eea>]
:ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160
Aug  5 00:35:30 grid06 kernel:  RSP <ffff81010514f818>



Is  there is something wrong in the our configuration ?

Thanks
David Levi-Hevroni


Papp Tamas tompos at martos.bme.hu
Tue Jun 17 12:36:29 PDT 2008

    * Previous message: [Lustre-discuss] 2.6.22
    * Next message: [Lustre-discuss] MGS disk size and activity
    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Bernd Schubert wrote:
>
> Yeah, this is what I immediately thought when I saw your trace. The kernel
> developer somehow manage to change the interface to the cache functions
> on each kernel version (though not during the last digit subversions)
> The trace lets me thing these functions have been called with the wrong
> arguments. However, lustre already has wrapper functions for this and
> I guess the configure script did something wrong this time.
> Unless the lustre developers step in, I will try to find some time
> tomorrow or on Thursday to check what's wrong.

Well, thank you very much.

Have somebody else tried 2.6.22 and lustre?

Bye,

tamas



More information about the lustre-discuss mailing list