[Lustre-discuss] SLES 11 SP1 Client rpms built but not working
peter.chiu at stfc.ac.uk
peter.chiu at stfc.ac.uk
Wed May 11 03:50:16 PDT 2011
Understood, Andreas,
Just to supplement is that the same approach works for SLES 11 using a xen kernel (2.6.27.54-0.2-xen).
The Lustre Client rpms works okay:
cmip-proc1:~ # cat /etc/issue
Welcome to SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l).
cmip-proc1:~ # uname -a
Linux cmip-proc1 2.6.27.54-0.2-xen #1 SMP 2010-10-19 18:40:07 +0200 x86_64 x86_64 x86_64 GNU/Linux
cmip-proc1:~ # df -h /disks/ceda1
Filesystem Size Used Avail Use% Mounted on
130.246.191.64:130.246.191.65 at tcp0:/ceda1
51T 130G 48T 1% /disks/ceda1
SLES 11 SP1 is a service pack update to SLES 11 (now on 2.6.32.29-0.3-xen).
Is it possible to find out what the problem is?
Regards,
Peter
-----Original Message-----
From: Andreas Dilger [mailto:adilger at whamcloud.com]
Sent: 11 May 2011 10:11
To: Chiu, Peter (STFC,RAL,RALSP)
Cc: <lustre-discuss at lists.lustre.org>; Chiu, Peter (STFC,RAL,RALSP)
Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working
The only other potential problem I see is that you are using a xen kernel and this us somehow causing problems.
Cheers, Andreas
On 2011-05-11, at 1:33 AM, <peter.chiu at stfc.ac.uk> wrote:
> Dear Andreas,
>
> Many thanks for your response.
>
> Below are further details on this.
>
> I shall be grateful for your advice on this.
>
> Regards,
>
> Peter
> ====================================================================================================
>
> The system is:
>
> cmip-proc8:/etc # uname -a
> Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux
>
> /usr/src/linux is a symlink pointing to the source corresponding to linux-2.6.32.29-0.3-obj:
>
> cmip-proc8:/etc # ls -l /usr/src
> total 24
> drwxr-xr-x 3 root root 4096 2011-05-09 08:31 debug
> lrwxrwxrwx 1 root root 19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3
> drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3
> drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj
> drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-obj
> drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5
> drwxr-xr-x 7 root root 4096 2011-03-20 14:58 packages
> cmip-proc8:/etc #
>
> cmip-proc8:~ # ls /usr/local/kits/lustre-1.8.5
>
> aclocal.m4 config.h.in install-sh Makefile
> autoMakefile config.log ldiskfs Makefile.in
> autoMakefile.am config.status libsysio missing
> autoMakefile.in config.sub lnet mkinstalldirs
> build configure lustre README
> ChangeLog configure.ac lustre-1.8.5.tar.gz Rules
> compile COPYING lustre-iokit snmp
> config.guess debian lustre.spec stamp-h1
> config.h depcomp lustre.spec.in tree_status
> cmip-proc8:~ #
>
> The build with .configure and make rpms produced rpms that are installable:
>
> cmip-proc8:/etc # ls -ls /usr/src/packages/RPMS/x86_64/*1.8.5*
> 4024 -rw-r--r-- 1 root root 4112883 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-1.8.52.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 15532 -rw-r--r-- 1 root root 15881360 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 1332 -rw-r--r-- 1 root root 1358924 2011-05-09 08:54 /usr/src/packages/RPMS/x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 1416 -rw-r--r-- 1 root root 1441937 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 3524 -rw-r--r-- 1 root root 3602163 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> 2600 -rw-r--r-- 1 root root 2656393 2011-05-09 08:53 /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
>
>
> cmip-proc8:/etc # rpm -e lustre-tests
> cmip-proc8:/etc # rpm -e lustre
> cmip-proc8:/etc # rpm -e lustre-modules
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing... ########################################### [100%]
> 1:lustre-modules ########################################### [100%]
> Congratulations on finishing your Lustre installation! To register
> your copy of Lustre and find out more about Lustre Support, Service,
> and Training offerings please visit
>
> http://www.sun.com/software/products/lustre/lustre_reg.jsp
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing... ########################################### [100%]
> 1:lustre ########################################### [100%]
> cmip-proc8:/etc # rpm -ivh /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
> Preparing... ########################################### [100%]
> 1:lustre-tests ########################################### [100%]
> cmip-proc8:/etc #
>
> ...
>
> cmip-proc8:/etc # rpm -qa | grep lustre
> lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815
> lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815
>
> The problem reproduces:
>
> cmip-proc8:~ # cp /var/log/messages /tmp/m0
> cmip-proc8:~ # dmesg > /tmp/d0
> cmip-proc8:~ # lsmod | grep lustre
> cmip-proc8:~ # modprobe lustre
> Killed
> cmip-proc8:~ # dmesg > /tmp/d1
> cmip-proc8:~ # cp /var/log/messages /tmp/m1
> cmip-proc8:~ # diff /tmp/d0 /tmp/d1
> 193a194,235
>> [ 84.786822] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:30:1e:5d:54:80:08:00 SRC=130.246.188.226 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 TTL=1 ID=34816 PROTO=2
>> [ 104.171306] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [ 104.171317] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [ 104.171328] PGD 7d9d0067 PUD 7d94c067 PMD 0
>> [ 104.171333] Oops: 0000 [#1] SMP
>> [ 104.171336] last sysfs file: /sys/module/ip_tables/initstate
>> [ 104.171339] CPU 0
>> [ 104.171341] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
>> [ 104.171373] Supported: Yes
>> [ 104.171376] Pid: 3441, comm: modprobe Tainted: G N 2.6.32.29-0.3-xen #1
>> [ 104.171379] RIP: e030:[<ffffffff8002c3d2>] [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [ 104.171384] RSP: e02b:ffff88007edade38 EFLAGS: 00010082
>> [ 104.171387] RAX: 0000000000000001 RBX: 0000000000009700 RCX: dead000000100100
>> [ 104.171390] RDX: 0000000000000000 RSI: ffff88007edade88 RDI: 0000000000000000
>> [ 104.171393] RBP: ffff88007edade58 R08: ffffffffa0252fb6 R09: 0000000000000000
>> [ 104.171396] R10: 0000000000000001 R11: ffffffff805f4200 R12: 0000000000009700
>> [ 104.171399] R13: 0000000000000000 R14: ffff88007edade88 R15: 000000000000000f
>> [ 104.171406] FS: 00007f541715a700(0000) GS:ffff8800013c1000(0000) knlGS:0000000000000000
>> [ 104.171409] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 104.171412] CR2: 0000000000000008 CR3: 000000007d905000 CR4: 0000000000002660
>> [ 104.171415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 104.171418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [ 104.171421] Process modprobe (pid: 3441, threadinfo ffff88007edac000, task ffff88007df8a400)
>> [ 104.171424] Stack:
>> [ 104.171426] ffffffffa02579f8 0000000000000000 0000000000623da0 0000000000623d30
>> [ 104.171430] <0> ffff88007edadeb8 ffffffff80038588 000000007fc11fa0 00000000a02579f8
>> [ 104.171435] <0> 00000000a0243060 0000000000000000 0000000000000001 ffffffffa02579f8
>> [ 104.171441] Call Trace:
>> [ 104.171449] [<ffffffff80038588>] try_to_wake_up+0x48/0x420
>> [ 104.171455] [<ffffffff8005b2e8>] up+0x48/0x50
>> [ 104.171464] [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
>> [ 104.171478] [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
>> [ 104.171489] [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
>> [ 104.171495] [<ffffffff8006d154>] sys_init_module+0xe4/0x270
>> [ 104.171500] [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
>> [ 104.171506] [<00007f5416cf3f7a>] 0x7f5416cf3f7a
>> [ 104.171508] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87
>> [ 104.171544] RIP [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
>> [ 104.171548] RSP <ffff88007edade38>
>> [ 104.171550] CR2: 0000000000000008
>> [ 104.171553] ---[ end trace 34c6e019e0aea7d2 ]---
>> [ 106.380129] SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:17:f2:0e:c4:a1:08:00 SRC=130.246.188.58 DST=224.0.0.1 LEN=44 TOS=0x00 PREC=0x00 TTL=1 ID=27534 PROTO=UDP SPT=54228 DPT=8612 LEN=24
> cmip-proc8:~ #
>
>
> -----Original Message-----
> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> Sent: 10 May 2011 21:48
> To: Chiu, Peter (STFC,RAL,RALSP)
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working
>
> On May 9, 2011, at 11:38, <peter.chiu at stfc.ac.uk> <peter.chiu at stfc.ac.uk> wrote:
>> The rpms lustre-modules, lustre and lustre-tests were then installed smoothly without any complaints.
>>
>> But the subsequent "modprobe lustre" will return a "Killed" message, with no lustre module loaded.
>>
>> dmesg also reveals "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>>
>> A second modprobe lustre command will then hang, again with no module loaded.
>> Subsequently the client is not able to mount the lustre storage.
>>
>> Can anyone shed some light as to what has gone wrong here please?
>>
>> ./configure --with-linux=/usr/src/linux --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen
>
> Are you sure that "/usr/src/linux" points to the same source as "/usr/src/linux-2.6.32.29-0.3-obj"? Is that a symlink? Normally the source and -obj files have a very similar pathname (i.e. just with "-obj" suffix difference).
>
>>> [ 168.647996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>>> [ 168.648066] Pid: 3445, comm: modprobe Tainted: G N 2.6.32.29-0.3-xen #1
>> 0000000000000400
>>> [ 168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, task ffff88007e9100c0)
>>> [ 168.648129] Call Trace:
>>> [ 168.648138] [<ffffffff80038588>] try_to_wake_up+0x48/0x420
>>> [ 168.648143] [<ffffffff8005b2e8>] up+0x48/0x50
>>> [ 168.648153] [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
>>> [ 168.648167] [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
>>> [ 168.648178] [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
>>> [ 168.648184] [<ffffffff8006d154>] sys_init_module+0xe4/0x270
>>> [ 168.648189] [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
>>> [ 168.648194] [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a
>>
>> I have tried Lustre-1.8.4, but got the same result.
>> I have also tried to follow the 1.8 Operations Manual to locate the diagnostic tools, but the link wiki.lustre.org is no longer valid.
>
> This looks like a pretty serious error to oops during module insertion, and I'd suspect the build environment before any particular Lustre code.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer
> Whamcloud, Inc.
>
>
>
> --
> Scanned by iCritical.
--
Scanned by iCritical.
More information about the lustre-discuss
mailing list