[Lustre-discuss] kernel freeze
Papp Tamás
tompos at martos.bme.hu
Thu Mar 20 05:48:23 PDT 2008
Dear All,
What could cause this error?
Kernel: 2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd and
2.6.9-55.0.9.EL_lustre.1.6.4.1smp (CentOS 4.4)
After the node freezed up, his failover pair took over the resource, but
it did it too.
I've just looked back in logs and I see, this header corrupted messages
some more times in the last few days.
After I turned it on again, it freezed up in 10 minutes.
Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!
Mar 20 10:57:19 node2 kernel: LDISKFS-fs: invalid magic = 0x281e
Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!
Mar 20 10:58:43 node2 kernel: Lustre: hallmark-OST0002: haven't heard
from client 078bd69d-b701-7dc9-3360-da43cd285d06 (at 192.168.0.150 at tcp)
in 227 seconds.
I think it's dead, and I am evicting it.
Mar 20 11:03:25 node2 kernel: ------------[ cut here ]------------
Mar 20 11:03:25 node2 kernel: kernel BUG at
/usr/src/redhat/BUILD/lustre-1.6.0.1/lustre/ldiskfs/extents.c:1751!
Mar 20 11:03:25 node2 kernel: invalid operand: 0000 [#1]
Mar 20 11:03:25 node2 kernel: SMP
Mar 20 11:03:25 node2 kernel: Modules linked in: obdfilter(U)
fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) lustre(U) lov(U) lquota(U)
mdc(U) ksocklnd(U) ptlrpc
(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) mptctl(U) mptbase(U) drbd(U)
nfsd(U) exportfs(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U)
autofs4(U) i2c_dev(U
) i2c_core(U) nfs(U) lockd(U) nfs_acl(U) sunrpc(U) dm_mirror(U)
dm_mod(U) button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U)
hw_random(U) e1000(U) sk98lin(U)
floppy(U) ext3(U) jbd(U) aacraid(U) ata_piix(U) libata(U) sd_mod(U)
scsi_mod(U)
Mar 20 11:03:25 node2 kernel: CPU: 1
Mar 20 11:03:25 node2 kernel: EIP: 0060:[<fb8ff40a>] Tainted:
GF VLI
Mar 20 11:03:25 node2 kernel: EFLAGS: 00010213
(2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd)
Mar 20 11:03:25 node2 kernel: EIP is at
ldiskfs_ext_remove_space+0x13f/0x2cf [ldiskfs]
Mar 20 11:03:25 node2 kernel: eax: 00007067 ebx: 00000018 ecx:
e5658000 edx: 00001001
Mar 20 11:03:25 node2 kernel: esi: f6095e00 edi: 00000002 ebp:
f6095e00 esp: e6b4bb60
Mar 20 11:03:25 node2 kernel: ds: 007b es: 007b ss: 0068
Mar 20 11:03:25 node2 kernel: Process ll_ost_io_38 (pid: 25495,
threadinfo=e6b4b000 task=e6e77330)
Mar 20 11:03:25 node2 kernel: Stack: 00000000 00000001 f5664304 00000002
f7cede00 ffffffff 00000000 e6b4bb9c
Mar 20 11:03:25 node2 kernel: f7cede00 f5664304 e8f250fc e8f25028
fb8ffd3c 00000246 f7cede00 e8f250fc
Mar 20 11:03:25 node2 kernel: e8f25028 e8f250fc 0000003c d190459c
e8f25258 fb913b44 00000000 00080000
Mar 20 11:03:25 node2 kernel: Call Trace:
Mar 20 11:03:25 node2 kernel: [<fb8ffd3c>]
ldiskfs_ext_truncate+0x12d/0x176 [ldiskfs]
Mar 20 11:03:25 node2 kernel: [<fb8f1213>] ldiskfs_truncate+0x112/0x486
[ldiskfs]
Mar 20 11:03:25 node2 kernel: [<c02d4fd6>] __cond_resched+0x14/0x39
Mar 20 11:03:25 node2 kernel: [<fb8f1f4a>]
ldiskfs_do_update_inode+0x320/0x347 [ldiskfs]
Mar 20 11:03:25 node2 kernel: [<f8897d43>]
journal_get_write_access+0x25/0x2c [jbd]
Mar 20 11:03:25 node2 kernel: [<c014e3cc>] vmtruncate+0xcb/0xee
Mar 20 11:03:25 node2 kernel: [<c0173247>] inode_setattr+0x64/0x1b3
Mar 20 11:03:25 node2 kernel: [<fb8f2129>] ldiskfs_setattr+0x179/0x1c9
[ldiskfs]
Mar 20 11:03:25 node2 kernel: [<fb93ffb7>]
fsfilt_ldiskfs_setattr+0x129/0x212 [fsfilt_ldiskfs]
Mar 20 11:03:25 node2 kernel: [<fbbab7d2>]
filter_setattr_internal+0x65f/0x177a [obdfilter]
Mar 20 11:03:25 node2 kernel: [<fbba45c0>]
filter_fid2dentry+0x654/0x8df [obdfilter]
Mar 20 11:03:25 node2 kernel: [<fbb9e7ca>] filter_fmd_get+0x263/0x391
[obdfilter]
Mar 20 11:03:25 node2 kernel: [<fbb9e8ee>] filter_fmd_get+0x387/0x391
[obdfilter]
Mar 20 11:03:25 node2 kernel: [<fbbad2f1>] filter_setattr+0x260/0x48e
[obdfilter]
Mar 20 11:03:25 node2 kernel: [<fbbb339f>] filter_truncate+0x281/0x316
[obdfilter]
Mar 20 11:03:25 node2 kernel: [<fb928bd1>] obd_punch+0x3f8/0x48b [ost]
Mar 20 11:03:25 node2 kernel: [<fb92871f>] ost_punch+0x351/0x40b [ost]
Mar 20 11:03:25 node2 kernel: [<fb93340a>] ost_handle+0x1e38/0x344c [ost]
Mar 20 11:03:25 node2 kernel: [<fbef0389>]
ptlrpc_server_handle_request+0xb76/0x136f [ptlrpc]
Mar 20 11:03:25 node2 kernel: [<fbef1acc>] ptlrpc_main+0x7ee/0x9b5 [ptlrpc]
Mar 20 11:03:25 node2 kernel: [<c011e7f5>] default_wake_function+0x0/0xc
Mar 20 11:03:25 node2 kernel: [<fbef12d1>] ptlrpc_retry_rqbds+0x0/0xd
[ptlrpc]
Mar 20 11:03:25 node2 kernel: [<c02d693e>] ret_from_fork+0x6/0x14
Mar 20 11:03:25 node2 kernel: [<fbef12d1>] ptlrpc_retry_rqbds+0x0/0xd
[ptlrpc]
Mar 20 11:03:25 node2 kernel: [<fbef12de>] ptlrpc_main+0x0/0x9b5 [ptlrpc]
Mar 20 11:03:25 node2 kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb
Mar 20 11:03:25 node2 kernel: Code: 00 75 0b 8b 44 33 14 8b 40 1c 89 44
33 10 8b 4c 33 10 0f b7 41 04 66 39 41 02 76 08 0f 0b d6 06 af 9b 90 fb
66 81 39 0a f
3 74 08 <0f> 0b d7 06 af 9b 90 fb 8b 44 33 0c 85 c0 75 1d 8b 54 24 14 89
Mar 20 11:03:25 node2 kernel: <0>Fatal exception: panic in 5 seconds
:
More information about the lustre-discuss
mailing list