[Lustre-discuss] SLES 11 SP1 Client rpms built but not working

peter.chiu at stfc.ac.uk peter.chiu at stfc.ac.uk
Fri May 13 03:02:29 PDT 2011


Dear Andreas,

I wonder if there is any further advice you can kindly offer as to how to troubleshoot the failure in bringing up lustre module?

Many thanks.

Peter

-----Original Message-----
From: Chiu, Peter (STFC,RAL,RALSP) 
Sent: 11 May 2011 11:50
To: Andreas Dilger
Cc: <lustre-discuss at lists.lustre.org>; Chiu, Peter (STFC,RAL,RALSP)
Subject: RE: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working

Understood, Andreas,

Just to supplement is that the same approach works for SLES 11 using a xen kernel (2.6.27.54-0.2-xen).
The Lustre Client rpms works okay:

cmip-proc1:~ # cat /etc/issue

Welcome to SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l).

cmip-proc1:~ # uname -a
Linux cmip-proc1 2.6.27.54-0.2-xen #1 SMP 2010-10-19 18:40:07 +0200 x86_64 x86_64 x86_64 GNU/Linux
cmip-proc1:~ # df -h /disks/ceda1
Filesystem            Size  Used Avail Use% Mounted on
130.246.191.64:130.246.191.65 at tcp0:/ceda1
                       51T  130G   48T   1% /disks/ceda1


SLES 11 SP1 is a service pack update to SLES 11 (now on 2.6.32.29-0.3-xen).

Is it possible to find out what the problem is? 

Regards,
Peter


-----Original Message-----
From: Andreas Dilger [mailto:adilger at whamcloud.com] 
Sent: 11 May 2011 10:11
To: Chiu, Peter (STFC,RAL,RALSP)
Cc: <lustre-discuss at lists.lustre.org>; Chiu, Peter (STFC,RAL,RALSP)
Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working

The only other potential problem I see is that you are using a xen kernel and this us somehow causing problems. 

Cheers, Andreas

On 2011-05-11, at 1:33 AM, <peter.chiu at stfc.ac.uk> wrote:

> Dear Andreas,
> 
> Many thanks for your response.
> 
> Below are further details on this.
> 
> I shall be grateful for your advice on this.
> 
> Regards,
> 
> Peter
> ====================================================================================================
> 
> The system is:
> 
> cmip-proc8:/etc # uname -a
> Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux
> 
> /usr/src/linux is a symlink pointing to the source corresponding to linux-2.6.32.29-0.3-obj:
> 
> cmip-proc8:/etc # ls -l /usr/src
> total 24
> drwxr-xr-x  3 root root 4096 2011-05-09 08:31 debug
> lrwxrwxrwx  1 root root   19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3
> drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3
> drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj
> drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-obj
> drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5
> drwxr-xr-x  7 root root 4096 2011-03-20 14:58 packages
> cmip-proc8:/etc #
> 
> cmip-proc8:~ # ls /usr/local/kits/lustre-1.8.5
> 
> aclocal.m4       config.h.in    install-sh           Makefile
> autoMakefile     config.log     ldiskfs              Makefile.in
> autoMakefile.am  config.status  libsysio             missing
> autoMakefile.in  config.sub     lnet                 mkinstalldirs
> build            configure      lustre               README
> ChangeLog        configure.ac   lustre-1.8.5.tar.gz  Rules
> compile          COPYING        lustre-iokit         snmp
> config.guess     debian         lustre.spec          stamp-h1
> config.h         depcomp        lustre.spec.in       tree_status
> cmip-proc8:~ #
> 
> The build with .configure and make rpms produced rpms that are installable:
> 
> cmip-proc8:/etc # ls -ls /usr/src/packages/RPMS/x86_64/*1.8.5*
> 4024 -rw-r--r-- 1 root root  4112883 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-1.8.52.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 15532 -rw-r--r-- 1 root root 15881360 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 1332 -rw-r--r-- 1 root root  1358924 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 1416 -rw-r--r-- 1 root root  1441937 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 3524 -rw-r--r-- 1 root root  3602163 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 2600 -rw-r--r-- 1 root root  2656393 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 
> 
> cmip-proc8:/etc # rpm -e lustre-tests
> cmip-proc8:/etc # rpm -e lustre
> cmip-proc8:/etc # rpm -e lustre-modules
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing...                ########################################### [100%]
>   1:lustre-modules         ########################################### [100%]
> Congratulations on finishing your Lustre installation!  To register
> your copy of Lustre and find out more about Lustre Support, Service,
> and Training offerings please visit
> 
> http://www.sun.com/software/products/lustre/lustre_reg.jsp
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing...                ########################################### [100%]
>   1:lustre                 ########################################### [100%]
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing...                ########################################### [100%]
>   1:lustre-tests           ########################################### [100%]
> cmip-proc8:/etc #
> 
> ...
> 
> cmip-proc8:/etc # rpm -qa | grep lustre
> lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815
> 
> The problem reproduces:
> 
> cmip-proc8:~ # cp /var/log/messages /tmp/m0
> cmip-proc8:~ # dmesg > /tmp/d0
> cmip-proc8:~ # lsmod | grep lustre
> cmip-proc8:~ # modprobe lustre
> Killed
> cmip-proc8:~ # dmesg > /tmp/d1
> cmip-proc8:~ # cp /var/log/messages /tmp/m1
> cmip-proc8:~ # diff /tmp/d0 /tmp/d1
> 193a194,235
>> [   84.786822] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:30:1e:5d:54:80:08:00 SRC=130.246.188.226 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 TTL=1 ID=34816 PROTO=2 
>> [  104.171306] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [  104.171317] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [  104.171328] PGD 7d9d0067 PUD 7d94c067 PMD 0 
>> [  104.171333] Oops: 0000 [#1] SMP 
>> [  104.171336] last sysfs file: /sys/module/ip_tables/initstate
>> [  104.171339] CPU 0
>> [  104.171341] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
>> [  104.171373] Supported: Yes
>> [  104.171376] Pid: 3441, comm: modprobe Tainted: G          N  2.6.32.29-0.3-xen #1 
>> [  104.171379] RIP: e030:[<ffffffff8002c3d2>]  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [  104.171384] RSP: e02b:ffff88007edade38  EFLAGS: 00010082
>> [  104.171387] RAX: 0000000000000001 RBX: 0000000000009700 RCX: dead000000100100
>> [  104.171390] RDX: 0000000000000000 RSI: ffff88007edade88 RDI: 0000000000000000
>> [  104.171393] RBP: ffff88007edade58 R08: ffffffffa0252fb6 R09: 0000000000000000
>> [  104.171396] R10: 0000000000000001 R11: ffffffff805f4200 R12: 0000000000009700
>> [  104.171399] R13: 0000000000000000 R14: ffff88007edade88 R15: 000000000000000f
>> [  104.171406] FS:  00007f541715a700(0000) GS:ffff8800013c1000(0000) knlGS:0000000000000000
>> [  104.171409] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  104.171412] CR2: 0000000000000008 CR3: 000000007d905000 CR4: 0000000000002660
>> [  104.171415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  104.171418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  104.171421] Process modprobe (pid: 3441, threadinfo ffff88007edac000, task ffff88007df8a400)
>> [  104.171424] Stack:
>> [  104.171426]  ffffffffa02579f8 0000000000000000 0000000000623da0 0000000000623d30
>> [  104.171430] <0> ffff88007edadeb8 ffffffff80038588 000000007fc11fa0 00000000a02579f8
>> [  104.171435] <0> 00000000a0243060 0000000000000000 0000000000000001 ffffffffa02579f8
>> [  104.171441] Call Trace:
>> [  104.171449]  [<ffffffff80038588>] try_to_wake_up+0x48/0x420
>> [  104.171455]  [<ffffffff8005b2e8>] up+0x48/0x50
>> [  104.171464]  [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
>> [  104.171478]  [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
>> [  104.171489]  [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
>> [  104.171495]  [<ffffffff8006d154>] sys_init_module+0xe4/0x270
>> [  104.171500]  [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
>> [  104.171506]  [<00007f5416cf3f7a>] 0x7f5416cf3f7a
>> [  104.171508] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87 
>> [  104.171544] RIP  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [  104.171548]  RSP <ffff88007edade38>
>> [  104.171550] CR2: 0000000000000008
>> [  104.171553] ---[ end trace 34c6e019e0aea7d2 ]---
>> [  106.380129] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:17:f2:0e:c4:a1:08:00 SRC=130.246.188.58 DST=224.0.0.1 LEN=44 TOS=0x00 PREC=0x00 TTL=1 ID=27534 PROTO=UDP SPT=54228 DPT=8612 LEN=24 
> cmip-proc8:~ #
> 
> 
> -----Original Message-----
> From: Andreas Dilger [mailto:adilger at whamcloud.com] 
> Sent: 10 May 2011 21:48
> To: Chiu, Peter (STFC,RAL,RALSP)
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working
> 
> On May 9, 2011, at 11:38, <peter.chiu at stfc.ac.uk> <peter.chiu at stfc.ac.uk> wrote:
>> The rpms lustre-modules, lustre and lustre-tests were then installed smoothly without any complaints.
>> 
>> But the subsequent "modprobe lustre" will return a "Killed" message, with no lustre module loaded.
>> 
>> dmesg also reveals  "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>> 
>> A second modprobe lustre command will then hang, again with no module loaded.
>> Subsequently the client is not able to mount the lustre storage.
>> 
>> Can anyone shed some light as to what has gone wrong here please?
>> 
>> ./configure --with-linux=/usr/src/linux --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen
> 
> Are you sure that "/usr/src/linux" points to the same source as "/usr/src/linux-2.6.32.29-0.3-obj"?  Is that a symlink?  Normally the source and -obj files have a very similar pathname (i.e. just with "-obj" suffix difference).
> 
>>> [  168.647996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>>> [  168.648066] Pid: 3445, comm: modprobe Tainted: G          N  2.6.32.29-0.3-xen #1
>> 0000000000000400
>>> [  168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, task ffff88007e9100c0)
>>> [  168.648129] Call Trace:
>>> [  168.648138]  [<ffffffff80038588>] try_to_wake_up+0x48/0x420
>>> [  168.648143]  [<ffffffff8005b2e8>] up+0x48/0x50
>>> [  168.648153]  [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
>>> [  168.648167]  [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
>>> [  168.648178]  [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
>>> [  168.648184]  [<ffffffff8006d154>] sys_init_module+0xe4/0x270
>>> [  168.648189]  [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
>>> [  168.648194]  [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a
>> 
>> I have tried Lustre-1.8.4, but got the same result.
>> I have also tried to follow the 1.8 Operations Manual to locate the diagnostic tools, but the link wiki.lustre.org is no longer valid.
> 
> This looks like a pretty serious error to oops during module insertion, and I'd suspect the build environment before any particular Lustre code.
> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 
> -- 
> Scanned by iCritical.
-- 
Scanned by iCritical.



More information about the lustre-discuss mailing list