[Lustre-discuss] SLES 11 SP1 Client rpms built but not working

peter.chiu at stfc.ac.uk peter.chiu at stfc.ac.uk
Mon May 9 10:38:49 PDT 2011


Hi all,

I used the method described below to build client rpms with the source kit lustre-1.8.5.tar.gz.

There was only one error reported during the make rpms, relating to lustre-iolit-1.2-root,
but the rpms were built under /usr/src/packages/RPMS/x86_64.

The rpms lustre-modules, lustre and lustre-tests were then installed smoothly without any complaints.

But the subsequent "modprobe lustre" will return a "Killed" message, with no lustre module loaded.

dmesg also reveals  "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"

A second modprobe lustre command will then hang, again with no module loaded.
Subsequently the client is not able to mount the lustre storage.

Can anyone shed some light as to what has gone wrong here please?

Many thanks.

Regards,

Peter Chiu
STFC Rutherford Appleton Laboratory
Space Science & Technology Department
Building R25, Room 2.02
Chilton
Didcot
OXON
OX11 0QX
UK

Phone:  01235-446699
Fax:      01235-445848
Email:   peter.chiu at stfc.ac.uk

Details:
===========================================================

Client host cmip-proc8:  cat /etc/issue:

Welcome to SUSE Linux Enterprise Server 11 SP1  (x86_64) - Kernel \r (\l).

cmip-proc8:~ # uname -a
Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux

Install kit from:
cd /usr/local/kits/lustre-1.8.5

ls -ls /usr/src/
4 drwxr-xr-x  3 root root 4096 2011-05-09 08:31 debug
0 lrwxrwxrwx  1 root root   19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3
4 drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3
4 drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj
4 drwxr-xr-x  3 root root 4096 2011-03-20 15:54 linux-obj
4 drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5
4 drwxr-xr-x  7 root root 4096 2011-03-20 14:58 packages

Install command:

./configure --with-linux=/usr/src/linux --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen
make rpms

One error recorded:
+ ./configure --prefix=/usr
configure: error: cannot find install-sh or install.sh in . ./.. ./../..
error: Bad exit status from /var/tmp/rpm-tmp.51316 (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.51316 (%build)
make[1]: *** [rpms] Error 1
make[1]: Leaving directory `/usr/local/kits/lustre-1.8.5/lustre-iokit'

By trial and error, this error can be avoided if I rsync /usr/local/kits/lustre-1.8.5/lustre-iokit /usr/src/packages/BUILD/lustre-iokit-1.2

Anyway, rpms are built under:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # ls /usr/src/packages/RPMS//x86_64/*1.8.5*
/usr/src/packages/RPMS//x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm

No error when  installing these rpms:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # rpm -qa | grep lustre
lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815


To check and load lustre module - none found

cmip-proc8:~ # lsmod | grep lustre
cmip-proc8:~ # modprobe lustre
Killed
cmip-proc8:~ # lsmod | grep lustre
cmip-proc8:~ # modprobe lustre &
[1] 3454
cmip-proc8:~ #
cmip-proc8:~ # ps auxw | grep lustre
root      3454  0.0  0.0   3940   624 pts/1    S    18:04   0:00 modprobe lustre

Dmesg records this error after the first "modeprobe lustre" command:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # diff /tmp/d1 /tmp/d2
195a196,250
> [  168.647996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [  168.648006] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  168.648018] PGD 7fac4067 PUD 7ef4c067 PMD 0
> [  168.648023] Oops: 0000 [#1] SMP
> [  168.648026] last sysfs file: /sys/module/ip_tables/initstate
> [  168.648028] CPU 0
> [  168.648030] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
> [  168.648063] Supported: Yes
> [  168.648066] Pid: 3445, comm: modprobe Tainted: G          N  2.6.32.29-0.3-xen #1
> [  168.648069] RIP: e030:[<ffffffff8002c3d2>]  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  168.648074] RSP: e02b:ffff88007efa5e38  EFLAGS: 00010082
> [  168.648077] RAX: 0000000000000001 RBX: 0000000000009700 RCX: dead000000100100
> [  168.648080] RDX: 0000000000000000 RSI: ffff88007efa5e88 RDI: 0000000000000000
> [  168.648083] RBP: ffff88007efa5e58 R08: ffffffffa0252fb6 R09: 0000000000000000
> [  168.648086] R10: 0000000000000001 R11: 0000000000000061 R12: 0000000000009700
> [  168.648089] R13: 0000000000000000 R14: ffff88007efa5e88 R15: 000000000000000f
> [  168.648095] FS:  00007f3f41030700(0000) GS:ffff8800013c1000(0000) knlGS:0000000000000000
> [  168.648098] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  168.648101] CR2: 0000000000000008 CR3: 000000007ef7d000 CR4: 0000000000002660
> [  168.648104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  168.648107] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, task ffff88007e9100c0)
> [  168.648113] Stack:
> [  168.648115]  ffffffffa02579f8 0000000000000000 0000000000623da0 0000000000623d30
> [  168.648118] <0> ffff88007efa5eb8 ffffffff80038588 000000007ef8ef00 00000000a02579f8
> [  168.648123] <0> 00000000a0243060 0000000000000000 0000000000000001 ffffffffa02579f8
> [  168.648129] Call Trace:
> [  168.648138]  [<ffffffff80038588>] try_to_wake_up+0x48/0x420
> [  168.648143]  [<ffffffff8005b2e8>] up+0x48/0x50
> [  168.648153]  [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
> [  168.648167]  [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
> [  168.648178]  [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
> [  168.648184]  [<ffffffff8006d154>] sys_init_module+0xe4/0x270
> [  168.648189]  [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
> [  168.648194]  [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a
> [  168.648196] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87
> [  168.648230] RIP  [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [  168.648234]  RSP <ffff88007efa5e38>
> [  168.648236] CR2: 0000000000000008
> [  168.648239] ---[ end trace 57429513f7001015 ]---
cmip-proc8:/usr/local/kits/lustre-1.8.5 #

I have tried Lustre-1.8.4, but got the same result.
I have also tried to follow the 1.8 Operations Manual to locate the diagnostic tools, but the link wiki.lustre.org is no longer valid.

-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110509/a0116f76/attachment.htm>


More information about the lustre-discuss mailing list