[Lustre-discuss] SLES 11 SP1 Client rpms built but not working

peter.chiu at stfc.ac.uk peter.chiu at stfc.ac.uk
Wed May 11 00:33:54 PDT 2011


Dear Andreas,

Many thanks for your response.

Below are further details on this.

I shall be grateful for your advice on this.

Regards,

Peter
====================================================================================================

The system is:

cmip-proc8:/etc # uname -a
Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux

/usr/src/linux is a symlink pointing to the source corresponding to linux-2.6.32.29-0.3-obj:

cmip-proc8:/etc # ls -l /usr/src
total 24
drwxr-xr-x  3 root root 4096 2011-05-09 08:31 debug
lrwxrwxrwx  1 root root   19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3
drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3
drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj
drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-obj
drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5
drwxr-xr-x  7 root root 4096 2011-03-20 14:58 packages
cmip-proc8:/etc #

cmip-proc8:~ # ls /usr/local/kits/lustre-1.8.5

aclocal.m4       config.h.in    install-sh           Makefile
autoMakefile     config.log     ldiskfs              Makefile.in
autoMakefile.am  config.status  libsysio             missing
autoMakefile.in  config.sub     lnet                 mkinstalldirs
build            configure      lustre               README
ChangeLog        configure.ac   lustre-1.8.5.tar.gz  Rules
compile          COPYING        lustre-iokit         snmp
config.guess     debian         lustre.spec          stamp-h1
config.h         depcomp        lustre.spec.in       tree_status
cmip-proc8:~ #

The build with .configure and make rpms produced rpms that are installable:

cmip-proc8:/etc # ls -ls /usr/src/packages/RPMS/x86_64/*1.8.5*
 4024 -rw-r--r-- 1 root root  4112883 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-1.8.52.6.32.29_0.3_xen_201105090815.x86_64.rpm
15532 -rw-r--r-- 1 root root 15881360 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
 1332 -rw-r--r-- 1 root root  1358924 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
 1416 -rw-r--r-- 1 root root  1441937 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
 3524 -rw-r--r-- 1 root root  3602163 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
 2600 -rw-r--r-- 1 root root  2656393 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm


cmip-proc8:/etc # rpm -e lustre-tests
cmip-proc8:/etc # rpm -e lustre
cmip-proc8:/etc # rpm -e lustre-modules
cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
Preparing...                ########################################### [100%]
   1:lustre-modules         ########################################### [100%]
Congratulations on finishing your Lustre installation!  To register
your copy of Lustre and find out more about Lustre Support, Service,
and Training offerings please visit

http://www.sun.com/software/products/lustre/lustre_reg.jsp
cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
Preparing...                ########################################### [100%]
   1:lustre                 ########################################### [100%]
cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
Preparing...                ########################################### [100%]
   1:lustre-tests           ########################################### [100%]
cmip-proc8:/etc #

...

cmip-proc8:/etc # rpm -qa | grep lustre
lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815

The problem reproduces:

cmip-proc8:~ # cp /var/log/messages /tmp/m0
cmip-proc8:~ # dmesg > /tmp/d0
cmip-proc8:~ # lsmod | grep lustre
cmip-proc8:~ # modprobe lustre
Killed
cmip-proc8:~ # dmesg > /tmp/d1
cmip-proc8:~ # cp /var/log/messages /tmp/m1
cmip-proc8:~ # diff /tmp/d0 /tmp/d1
193a194,235
> [   84.786822] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:30:1e:5d:54:80:08:00 SRC=130.246.188.226 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 TTL=1 ID=34816 PROTO=2 
> [  104.171306] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [  104.171317] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  104.171328] PGD 7d9d0067 PUD 7d94c067 PMD 0 
> [  104.171333] Oops: 0000 [#1] SMP 
> [  104.171336] last sysfs file: /sys/module/ip_tables/initstate
> [  104.171339] CPU 0
> [  104.171341] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
> [  104.171373] Supported: Yes
> [  104.171376] Pid: 3441, comm: modprobe Tainted: G          N  2.6.32.29-0.3-xen #1 
> [  104.171379] RIP: e030:[<ffffffff8002c3d2>]  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  104.171384] RSP: e02b:ffff88007edade38  EFLAGS: 00010082
> [  104.171387] RAX: 0000000000000001 RBX: 0000000000009700 RCX: dead000000100100
> [  104.171390] RDX: 0000000000000000 RSI: ffff88007edade88 RDI: 0000000000000000
> [  104.171393] RBP: ffff88007edade58 R08: ffffffffa0252fb6 R09: 0000000000000000
> [  104.171396] R10: 0000000000000001 R11: ffffffff805f4200 R12: 0000000000009700
> [  104.171399] R13: 0000000000000000 R14: ffff88007edade88 R15: 000000000000000f
> [  104.171406] FS:  00007f541715a700(0000) GS:ffff8800013c1000(0000) knlGS:0000000000000000
> [  104.171409] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  104.171412] CR2: 0000000000000008 CR3: 000000007d905000 CR4: 0000000000002660
> [  104.171415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  104.171418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  104.171421] Process modprobe (pid: 3441, threadinfo ffff88007edac000, task ffff88007df8a400)
> [  104.171424] Stack:
> [  104.171426]  ffffffffa02579f8 0000000000000000 0000000000623da0 0000000000623d30
> [  104.171430] <0> ffff88007edadeb8 ffffffff80038588 000000007fc11fa0 00000000a02579f8
> [  104.171435] <0> 00000000a0243060 0000000000000000 0000000000000001 ffffffffa02579f8
> [  104.171441] Call Trace:
> [  104.171449]  [<ffffffff80038588>] try_to_wake_up+0x48/0x420
> [  104.171455]  [<ffffffff8005b2e8>] up+0x48/0x50
> [  104.171464]  [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
> [  104.171478]  [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
> [  104.171489]  [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
> [  104.171495]  [<ffffffff8006d154>] sys_init_module+0xe4/0x270
> [  104.171500]  [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
> [  104.171506]  [<00007f5416cf3f7a>] 0x7f5416cf3f7a
> [  104.171508] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87 
> [  104.171544] RIP  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  104.171548]  RSP <ffff88007edade38>
> [  104.171550] CR2: 0000000000000008
> [  104.171553] ---[ end trace 34c6e019e0aea7d2 ]---
> [  106.380129] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:17:f2:0e:c4:a1:08:00 SRC=130.246.188.58 DST=224.0.0.1 LEN=44 TOS=0x00 PREC=0x00 TTL=1 ID=27534 PROTO=UDP SPT=54228 DPT=8612 LEN=24 
cmip-proc8:~ #


-----Original Message-----
From: Andreas Dilger [mailto:adilger at whamcloud.com] 
Sent: 10 May 2011 21:48
To: Chiu, Peter (STFC,RAL,RALSP)
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working

On May 9, 2011, at 11:38, <peter.chiu at stfc.ac.uk> <peter.chiu at stfc.ac.uk> wrote:
> The rpms lustre-modules, lustre and lustre-tests were then installed smoothly without any complaints.
>  
> But the subsequent "modprobe lustre" will return a "Killed" message, with no lustre module loaded.
>  
> dmesg also reveals  "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>  
> A second modprobe lustre command will then hang, again with no module loaded.
> Subsequently the client is not able to mount the lustre storage.
>  
> Can anyone shed some light as to what has gone wrong here please?
>  
> ./configure --with-linux=/usr/src/linux --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen

Are you sure that "/usr/src/linux" points to the same source as "/usr/src/linux-2.6.32.29-0.3-obj"?  Is that a symlink?  Normally the source and -obj files have a very similar pathname (i.e. just with "-obj" suffix difference).

> > [  168.647996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > [  168.648066] Pid: 3445, comm: modprobe Tainted: G          N  2.6.32.29-0.3-xen #1
> 0000000000000400
> > [  168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, task ffff88007e9100c0)
> > [  168.648129] Call Trace:
> > [  168.648138]  [<ffffffff80038588>] try_to_wake_up+0x48/0x420
> > [  168.648143]  [<ffffffff8005b2e8>] up+0x48/0x50
> > [  168.648153]  [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
> > [  168.648167]  [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
> > [  168.648178]  [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
> > [  168.648184]  [<ffffffff8006d154>] sys_init_module+0xe4/0x270
> > [  168.648189]  [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
> > [  168.648194]  [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a
>  
> I have tried Lustre-1.8.4, but got the same result.
> I have also tried to follow the 1.8 Operations Manual to locate the diagnostic tools, but the link wiki.lustre.org is no longer valid.

This looks like a pretty serious error to oops during module insertion, and I'd suspect the build environment before any particular Lustre code.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.



-- 
Scanned by iCritical.



More information about the lustre-discuss mailing list