[Lustre-discuss] kernel freeze

Papp Tamás tompos at martos.bme.hu
Thu Mar 20 05:48:23 PDT 2008


Dear All,

What could cause this error?
Kernel: 2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd and 
2.6.9-55.0.9.EL_lustre.1.6.4.1smp (CentOS 4.4)

After the node freezed up, his failover pair took over the resource, but 
it did it too.

I've just looked back in logs and I see, this header corrupted messages 
some more times in the last few days.
After I turned it on again, it freezed up in 10 minutes.


Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!
Mar 20 10:57:19 node2 kernel: LDISKFS-fs: invalid magic = 0x281e
Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!
Mar 20 10:58:43 node2 kernel: Lustre: hallmark-OST0002: haven't heard 
from client 078bd69d-b701-7dc9-3360-da43cd285d06 (at 192.168.0.150 at tcp) 
in 227 seconds.
 I think it's dead, and I am evicting it.
Mar 20 11:03:25 node2 kernel: ------------[ cut here ]------------
Mar 20 11:03:25 node2 kernel: kernel BUG at 
/usr/src/redhat/BUILD/lustre-1.6.0.1/lustre/ldiskfs/extents.c:1751!
Mar 20 11:03:25 node2 kernel: invalid operand: 0000 [#1]
Mar 20 11:03:25 node2 kernel: SMP
Mar 20 11:03:25 node2 kernel: Modules linked in: obdfilter(U) 
fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) lustre(U) lov(U) lquota(U) 
mdc(U) ksocklnd(U) ptlrpc
(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) mptctl(U) mptbase(U) drbd(U) 
nfsd(U) exportfs(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) 
autofs4(U) i2c_dev(U
) i2c_core(U) nfs(U) lockd(U) nfs_acl(U) sunrpc(U) dm_mirror(U) 
dm_mod(U) button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) 
hw_random(U) e1000(U) sk98lin(U)
 floppy(U) ext3(U) jbd(U) aacraid(U) ata_piix(U) libata(U) sd_mod(U) 
scsi_mod(U)
Mar 20 11:03:25 node2 kernel: CPU:    1
Mar 20 11:03:25 node2 kernel: EIP:    0060:[<fb8ff40a>]    Tainted: 
GF     VLI
Mar 20 11:03:25 node2 kernel: EFLAGS: 00010213   
(2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd)
Mar 20 11:03:25 node2 kernel: EIP is at 
ldiskfs_ext_remove_space+0x13f/0x2cf [ldiskfs]
Mar 20 11:03:25 node2 kernel: eax: 00007067   ebx: 00000018   ecx: 
e5658000   edx: 00001001
Mar 20 11:03:25 node2 kernel: esi: f6095e00   edi: 00000002   ebp: 
f6095e00   esp: e6b4bb60
Mar 20 11:03:25 node2 kernel: ds: 007b   es: 007b   ss: 0068
Mar 20 11:03:25 node2 kernel: Process ll_ost_io_38 (pid: 25495, 
threadinfo=e6b4b000 task=e6e77330)
Mar 20 11:03:25 node2 kernel: Stack: 00000000 00000001 f5664304 00000002 
f7cede00 ffffffff 00000000 e6b4bb9c
Mar 20 11:03:25 node2 kernel:        f7cede00 f5664304 e8f250fc e8f25028 
fb8ffd3c 00000246 f7cede00 e8f250fc
Mar 20 11:03:25 node2 kernel:        e8f25028 e8f250fc 0000003c d190459c 
e8f25258 fb913b44 00000000 00080000
Mar 20 11:03:25 node2 kernel: Call Trace:
Mar 20 11:03:25 node2 kernel:  [<fb8ffd3c>] 
ldiskfs_ext_truncate+0x12d/0x176 [ldiskfs]
Mar 20 11:03:25 node2 kernel:  [<fb8f1213>] ldiskfs_truncate+0x112/0x486 
[ldiskfs]
Mar 20 11:03:25 node2 kernel:  [<c02d4fd6>] __cond_resched+0x14/0x39
Mar 20 11:03:25 node2 kernel:  [<fb8f1f4a>] 
ldiskfs_do_update_inode+0x320/0x347 [ldiskfs]
Mar 20 11:03:25 node2 kernel:  [<f8897d43>] 
journal_get_write_access+0x25/0x2c [jbd]
Mar 20 11:03:25 node2 kernel:  [<c014e3cc>] vmtruncate+0xcb/0xee
Mar 20 11:03:25 node2 kernel:  [<c0173247>] inode_setattr+0x64/0x1b3
Mar 20 11:03:25 node2 kernel:  [<fb8f2129>] ldiskfs_setattr+0x179/0x1c9 
[ldiskfs]
Mar 20 11:03:25 node2 kernel:  [<fb93ffb7>] 
fsfilt_ldiskfs_setattr+0x129/0x212 [fsfilt_ldiskfs]
Mar 20 11:03:25 node2 kernel:  [<fbbab7d2>] 
filter_setattr_internal+0x65f/0x177a [obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fbba45c0>] 
filter_fid2dentry+0x654/0x8df [obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fbb9e7ca>] filter_fmd_get+0x263/0x391 
[obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fbb9e8ee>] filter_fmd_get+0x387/0x391 
[obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fbbad2f1>] filter_setattr+0x260/0x48e 
[obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fbbb339f>] filter_truncate+0x281/0x316 
[obdfilter]
Mar 20 11:03:25 node2 kernel:  [<fb928bd1>] obd_punch+0x3f8/0x48b [ost]
Mar 20 11:03:25 node2 kernel:  [<fb92871f>] ost_punch+0x351/0x40b [ost]
Mar 20 11:03:25 node2 kernel:  [<fb93340a>] ost_handle+0x1e38/0x344c [ost]
Mar 20 11:03:25 node2 kernel:  [<fbef0389>] 
ptlrpc_server_handle_request+0xb76/0x136f [ptlrpc]
Mar 20 11:03:25 node2 kernel:  [<fbef1acc>] ptlrpc_main+0x7ee/0x9b5 [ptlrpc]
Mar 20 11:03:25 node2 kernel:  [<c011e7f5>] default_wake_function+0x0/0xc
Mar 20 11:03:25 node2 kernel:  [<fbef12d1>] ptlrpc_retry_rqbds+0x0/0xd 
[ptlrpc]
Mar 20 11:03:25 node2 kernel:  [<c02d693e>] ret_from_fork+0x6/0x14
Mar 20 11:03:25 node2 kernel:  [<fbef12d1>] ptlrpc_retry_rqbds+0x0/0xd 
[ptlrpc]
Mar 20 11:03:25 node2 kernel:  [<fbef12de>] ptlrpc_main+0x0/0x9b5 [ptlrpc]
Mar 20 11:03:25 node2 kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Mar 20 11:03:25 node2 kernel: Code: 00 75 0b 8b 44 33 14 8b 40 1c 89 44 
33 10 8b 4c 33 10 0f b7 41 04 66 39 41 02 76 08 0f 0b d6 06 af 9b 90 fb 
66 81 39 0a f
3 74 08 <0f> 0b d7 06 af 9b 90 fb 8b 44 33 0c 85 c0 75 1d 8b 54 24 14 89
Mar 20 11:03:25 node2 kernel:  <0>Fatal exception: panic in 5 seconds
:






More information about the lustre-discuss mailing list