[Lustre-discuss] Kernel panic on mounting an OST

Ludovic Francois lfrancois at gmail.com
Thu Dec 13 02:55:29 PST 2007


On 12 déc, 17:51, Oleg Drokin <Oleg.Dro... at Sun.COM> wrote:
> Hello!
>
> On Dec 12, 2007, at 11:39 AM, Franck Martinaux wrote:
>
> > After a power outage, I get some difficulties to mount a OST.
> > I am running a lustre 1.6.3 and I get a panic on the OSS when I try to
> > mount a OST.
>
> It would greatly help us if you show us panic message and possibly
> stacktrace.


Hi,

Please find below all information we got this morning

Environment
===========

,----
| [root at oss01 ~]# uname -a
| Linux oss01.data.cluster 2.6.9-55.0.9.EL_lustre.1.6.3smp #1 SMP Sun
Oct 7 20:08:31 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
| [root at oss01 ~]#
`----

Mount of this specific OST
==========================

,----
| [root at oss01 ~]# mount -t lustre /dev/mpath/mpath0 /mnt/lustre/ost1
| Read from remote host oss01: Connection reset by peer
| Connection to oss01 closed.
| [ddn at admin01 ~]$
`----

/var/log/messages during the operation
======================================

--8<---------------cut here---------------start------------->8---
Dec 13 08:36:04 oss01 sshd(pam_unix)[13469]: session opened for user
root by root(uid=0)Dec 13 08:36:20 oss01 kernel: kjournald starting.
Commit interval 5 seconds
Dec 13 08:36:20 oss01 kernel: LDISKFS FS on dm-1, internal journal
Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: recovery complete.Dec 13
08:36:20 oss01 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Dec 13 08:36:20 oss01 kernel: kjournald starting.  Commit interval 5
seconds
Dec 13 08:36:20 oss01 kernel: LDISKFS FS on dm-1, internal journalDec
13 08:36:20 oss01 kernel: LDISKFS-fs: mounted filesystem with ordered
data mode.
Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: file extents enabled
Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: mballoc enabled
Dec 13 08:36:20 oss01 kernel: Lustre: OST lustre-OST0002 now serving
dev (lustre-OST0002/0258906d-8eca-ba98-4e3d-19adfa472914) with
recovery enabled
Dec 13 08:36:20 oss01 kernel: Lustre: Server lustre-OST0002 on device /
dev/mpath/mpath1 has started
Dec 13 08:36:21 oss01 kernel: LustreError: 137-5: UUID 'lustre-
OST0030_UUID' is not available  for connect (no target)
Dec 13 08:36:21 oss01 kernel: LustreError: Skipped 4 previous similar
messages
Dec 13 08:36:21 oss01 kernel: LustreError: 13664:0:(ldlm_lib.c:
1437:target_send_reply_msg()) @@@ processing error (-19)
req at 00000102244efe00 x146203/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl
Interpret:/0/0 rc -19/0
Dec 13 08:36:21 oss01 kernel: LustreError: 13664:0:(ldlm_lib.c:
1437:target_send_reply_msg()) Skipped 4 previous similar messages
Dec 13 08:36:41 oss01 kernel: LustreError: 137-5: UUID 'lustre-
OST0030_UUID' is not available  for connect (no target)
Dec 13 08:36:41 oss01 kernel: LustreError: 13665:0:(ldlm_lib.c:
1437:target_send_reply_msg()) @@@ processing error (-19)
req at 000001021d1d68
00 x146233/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc
-19/0
Dec 13 08:37:01 oss01 kernel: LustreError: 137-5: UUID 'lustre-
OST0030_UUID' is not available  for connect (no target)
Dec 13 08:37:01 oss01 kernel: LustreError: 13666:0:(ldlm_lib.c:
1437:target_send_reply_msg()) @@@ processing error (-19)
req at 0000010006b95c
00 x146264/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc
-19/0
Dec 13 08:37:21 oss01 kernel: LustreError: 137-5: UUID 'lustre-
OST0030_UUID' is not available  for connect (no target)
Dec 13 08:37:21 oss01 kernel: LustreError: 13667:0:(ldlm_lib.c:
1437:target_send_reply_msg()) @@@ processing error (-19)
req at 00000100cfe9ba
00 x146300/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc
-19/0
Dec 13 08:37:41 oss01 kernel: LustreError: 137-5: UUID 'lustre-
OST0030_UUID' is not available  for connect (no target)
Dec 13 08:37:41 oss01 kernel: LustreError: Skipped 5 previous similar
messages
Dec 13 08:37:41 oss01 kernel: LustreError: 13668:0:(ldlm_lib.c:
1437:target_send_reply_msg()) @@@ processing error (-19)
req at 0000010037e88e
00 x146373/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc
-19/0
Dec 13 08:37:41 oss01 kernel: LustreError: 13668:0:(ldlm_lib.c:
1437:target_send_reply_msg()) Skipped 5 previous similar messages
Dec 13 08:37:47 oss01 kernel: Lustre: Failing over lustre-OST0002
Dec 13 08:37:47 oss01 kernel: Lustre: *** setting obd lustre-OST0002
device 'unknown-block(253,1)' read-only ***
Dec 13 08:37:47 oss01 kernel: Turning device dm-1 (0xfd00001) read-
only
Dec 13 08:37:47 oss01 kernel: Lustre: lustre-OST0002: shutting down
for failover; client state will be preserved.
Dec 13 08:37:47 oss01 kernel: Lustre: OST lustre-OST0002 has stopped.
Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 blocks 1 reqs (0
success)
Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 extents scanned,
1 goal hits, 0 2^N hits, 0 breaks, 0 lost
Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 generated and it
took 12560
Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 256 preallocated, 0
discarded
Dec 13 08:37:47 oss01 kernel: Removing read-only on dm-1 (0xfd00001)
Dec 13 08:37:47 oss01 kernel: Lustre: server umount lustre-OST0002
complete
Dec 13 08:37:57 oss01 sshd(pam_unix)[13946]: session opened for user
root by root(uid=0)
Dec 13 08:38:18 oss01 kernel: kjournald starting.  Commit interval 5
seconds
Dec 13 08:38:18 oss01 kernel: LDISKFS FS on dm-0, internal journal
Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Dec 13 08:38:18 oss01 kernel: kjournald starting.  Commit interval 5
seconds
Dec 13 08:38:18 oss01 kernel: LDISKFS FS on dm-0, internal journal
Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: file extents enabled
Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mballoc enabled
Dec 13 08:43:52 oss01 syslogd 1.4.1: restart.
Dec 13 08:43:52 oss01 syslog: syslogd startup succeeded
Dec 13 08:43:52 oss01 kernel: klogd 1.4.1, log source = /proc/kmsg
started.
--8<---------------cut here---------------end--------------->8---

We have to do a power cycle to connect again
============================================

,----
| # ipmitool -I lan -H 192.168.99.101 -U $login -P $password power
cycle
`----


The OST fsck seems correct
==========================

,----
| [root at oss01 log]# fsck.ext2 /dev/mpath/mpath0
| e2fsck 1.40.2.cfs1 (12-Jul-2007)
| lustre-OST0030: recovering journal
| lustre-OST0030: clean, 227/244195328 files, 15614685/976760320
blocks
| [root at oss01 log]#
`----

tunefs.lustre reads correctly mpath0 information
================================================

,----
| [root at oss01 log]# tunefs.lustre /dev/mpath/mpath0
| checking for existing Lustre data: found CONFIGS/mountdata
| Reading CONFIGS/mountdata
|
|    Read previous values:
| Target:     lustre-OST0030
| Index:      48
| Lustre FS:  lustre
| Mount type: ldiskfs
| Flags:      0x142
|               (OST update writeconf )
| Persistent mount opts: errors=remount-ro,extents,mballoc
| Parameters: mgsnode=10.143.0.5 at tcp mgsnode=10.143.0.6 at tcp
failover.node=10.143.0.2 at tcp sys.timeout=80 mgsnode=10.143.0.5 at tcp
mgsnode=10.143.0.6 at tcp failover.node=10.143.0.2 at tcp sys.timeout=80
|
|
|    Permanent disk data:
| Target:     lustre-OST0030
| Index:      48
| Lustre FS:  lustre
| Mount type: ldiskfs
| Flags:      0x142
|               (OST update writeconf )
| Persistent mount opts: errors=remount-ro,extents,mballoc
| Parameters: mgsnode=10.143.0.5 at tcp mgsnode=10.143.0.6 at tcp
failover.node=10.143.0.2 at tcp sys.timeout=80 mgsnode=10.143.0.5 at tcp
mgsnode=10.143.0.6 at tcp failover.node=10.143.0.2 at tcp sys.timeout=80
|
| Writing CONFIGS/mountdata
| [root at oss01 log]#
`----


DDN lun is ready and working correctly
======================================

,----[ OSS view ]
| [root at oss01 log]# multipath -l | grep mpath0
| mpath0 (360001ff00fd4922302000800001d1c17)
| [root at oss01 log]#
`----

,----[ S2A9550 view ]
| [ddn at admin01 ~]$ s2a -h 10.141.0.92 -e "lun list" | grep -i
0fd492230200
|   2                     1    Ready          3815470 0FD492230200
| [ddn at admin01 ~]$
`----

Stack trace (We got it from OSS02 via the serial line during a
mounting try)
============================================================================

--8<---------------cut here---------------start------------->8---
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LustreError: 134-6: Trying to start OBD lustre-OST0030_UUID using the
wrong disk lustre-OST0000_UUID. Were the /dev/ assignments rearranged
?
LustreError: 10203:0:(filter.c:1022:filter_prep()) cannot read
last_rcvd: rc = -22
LustreEr<ro4>re:i p10:2 f0f3f:0ff:f(fobfad_0c3aon2f12ig1.
c:325:class_setup()) set--up- -l-u--s-t-re--- OS[Tcu00t 30h efreai ]le
d- -(----22--)-
 [please bite here ] -L-u-s--tr--eE--
r                                                -
or: 10203:0:(obd_config.Kcer:n1e06l2 B:cUlG asast
_csponifnilogc_lkl:o1g19_h
ndler()) Err -22 on cfign cvaolmimdan
odp:                                  a
and: 0000 [1] SLMuPs tre:    cmd=cf003 0:lu<s4tr>
OST0030  1:dev  2:type CP 3U: 3f
ustreError: 15b-f: MGC1M0o.d1ul4e3s. 0l.i5 at nktcepd : inT:he
configuration from l ogo bd'lfuisltterer-OST0030' failed (-22).
( U)Make sure th
is client a ndf stfihlet _MlGSdi askrfes running compatible
ver(siUo)ns of
Lustre.
 oLussttreError: 15c-8: MGC10.<144>(3.U0).5 at tcp: The configurati omng
cfrom log 'lustre-OST003(0U') failed (-22). This may l dbies tkhfes r
esult of communicatio(nU) errors between this nod el usantdr ethe MGS,
a bad configur(aU)tion, or other errors.  Seloev the syslog for more
 
inf(oU)rmation.
 LlqusutotraeError: 10203:0:(obd_mo<u4nt>.(cU):1082:server_start_targe
tmds(c)) failed to start serv(eUr) lustre-OST0030: -22
 ksocklnd(LU)ustreError: 10203:0:(obd<4_>m oupnttl.rpcc:
1573:server_fill_super((U))) Unable to start target so:bd -c2l2a
(ULu)streError: 10203:0:(obd<_4c>o nlfneitg.c:392:class_cleanup())
( U)Device 2 not setup                                ss
 lvfs(U) libcfs(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U)
autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) ds(U) yenta_socket(U)
pcmcia_c
ore(U) dm_mirror(U) dm_round_robin(U) dm_multipath(U) joydev(U)
button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) hw_random(U)
myri10ge(U)
 bnx2(U) ext3(U) jbd(U) dm_mod(U) qla2400(U) ata_piix(U)
megaraid_sas(U) qla2xxx(U) scsi_transport_fc(U) sd_mod(U)
multipath(U)
Pid: 10286, comm: ptlrpcd Tainted: GF     2.6.9-55.0.9.EL_lustre.
1.6.3smp
RIP: 0010:[<ffffffff80321465>] <ffffffff80321465>{__lock_text_start
+32}
RSP: 0018:0000010218cd9bc8  EFLAGS: 00010216
RAX: 0000000000000016 RBX: 000001021654e4bc RCX: 0000000000020000
RDX: 000000000000baa7 RSI: 0000000000000246 RDI: ffffffff80396fc0
RBP: 000001021654e4a0 R08: 00000000fffffffe R09: 000001021654e4bc
R10: 0000000000000000 R11: 0000000000000000 R12: 00000102196e6058
R13: 00000102196e6000 R14: 0000010218cd9eb8 R15: 0000010218cd9e58
FS:  0000002a9557ab00(0000) GS:ffffffff804a6880(0000) knlGS:
0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95557000 CR3: 0000000228514000 CR4: 00000000000006e0
Process ptlrpcd (pid: 10286, threadinfo 0000010218cd8000, task
00000102170b4030)
Stack: 000001021654e4bc ffffffffa03a2121 000001021a99304e
ffffffffa03b32a0
       000001021654e0b0 ffffffffa04d6510 0000008000000000
0000000000000000
       0000000000000000 00000102203920c0
Call Trace:<ffffffffa03a2121>{:lquota:filter_quota_clearinfo+49}
       <ffffffffa04d6510>{:obdfilter:filter_destroy_export+560}
       <ffffffff80131923>{recalc_task_prio+337}
<ffffffffa02586fd>{:obdclass:class_export_destroy+381}
       <ffffffffa025c336>{:obdclass:obd_zombie_impexp_cull+150}
       <ffffffffa0318345>{:ptlrpc:ptlrpcd_check+229}
<ffffffffa031883a>{:ptlrpc:ptlrpcd+874}
       <ffffffff80133566>{default_wake_function+0}
<ffffffffa02eb450>{:ptlrpc:ptlrpc_expired_set+0}
       <ffffffffa02eb450>{:ptlrpc:ptlrpc_expired_set+0}
<ffffffff80133566>{default_wake_function+0}
       <ffffffff80110de3>{child_rip+8}
<ffffffffa03184d0>{:ptlrpc:ptlrpcd+0}
       <ffffffff80110ddb>{child_rip+0}

Code: 0f 0b 04 c2 33 80 ff ff ff ff 77 00 f0 ff 0b 0f 88 8b 03 00
RIP <ffffffff80321465>{__lock_text_start+32} RSP <0000010218cd9bc8>
 <0>Kernel pani<4c> -L DnIoStKF sSy-nfcs:i nmg:b aOlolopsc
1 blocks 1 reqs (0 succ ess)                              :
--8<---------------cut here---------------end--------------->8---

If you need more information or debug, feel free to request us. The
problem occurs only with this OST.

Thanks, Ludo

--
Ludovic Francois                 +33 (0)6 14 77 26 93
System Engineer                  DataDirect Networks




More information about the lustre-discuss mailing list