[Lustre-discuss] odd kernel crash after a heartbeat failover
John White
jwhite at lbl.gov
Thu Apr 15 13:10:32 PDT 2010
Hello Folks,
We just had a very odd crash after a heartbeat failover that may or may not be related to each other. I'm not specifically sure if this was an IO error on the disk (I see no actual EIO, just the journal commit crash). Any ideas? The FS went through recovery just fine and doesn't appear to have any corruption:
[...]
Apr 15 11:49:32 n0007.lustre heartbeat: [10940]: info: mach_down takeover complete.
Apr 15 12:09:55 n0007.lustre kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
Apr 15 12:09:55 n0007.lustre kernel: [<ffffffff88abc375>] :jbd:journal_commit_transaction+0xc33/0x132e
Apr 15 12:09:55 n0007.lustre kernel: Oops: 0002 [1] SMP
Apr 15 12:09:55 n0007.lustre kernel: Oops: 0002 [1] SMP
Apr 15 12:09:55 n0007.lustre kernel: last sysfs file: /block/dm-3/dev
Apr 15 12:09:55 n0007.lustre kernel: RIP [<ffffffff88abc375>] :jbd:journal_commit_transaction+0xc33/0x132e
Apr 15 12:09:55 n0007.lustre kernel: CR2: 0000000000000000
Apr 15 12:09:55 n0007.lustre kernel: CR2: 0000000000000000
Apr 15 12:13:25 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358805.4890
Apr 15 12:13:25 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358805.3719
Apr 15 12:13:25 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358805.3719
Apr 15 12:13:25 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358805.4807
Apr 15 12:13:26 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358806.3725
Apr 15 12:13:30 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358810.3714
Apr 15 12:13:30 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358810.4796
Apr 15 12:13:30 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358810.3740
Apr 15 12:13:30 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358810.4991
Apr 15 12:13:39 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358819.3727
Apr 15 12:13:39 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358819.5109
Apr 15 12:13:39 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358819.5072
Apr 15 12:13:39 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358819.4812
Apr 15 12:13:40 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358820.3720
Apr 15 12:13:41 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358821.3732
Apr 15 12:13:41 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358821.5047
Apr 15 12:13:41 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271358821.4862
Apr 15 12:24:09 n0006.lustre kernel: LustreError: dumping log to /tmp/lustre-log.1271359449.3725
logs available upon request.
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720
More information about the lustre-discuss
mailing list