[Lustre-discuss] odd kernel crash after a heartbeat failover

Andreas Dilger andreas.dilger at oracle.com
Fri Apr 16 16:44:11 PDT 2010


On 2010-04-16, at 11:29, John White wrote:
> Just to follow-up, after enabling netconsole to get some meaningful  
> logging out of these OSSs, it is clear that there's a problem with  
> the backend storage communication and that this certainly isn't a  
> lustre issue.  Thanks folks.
>
> On Apr 15, 2010, at 9:45 PM, Cliff White wrote:
>> John White wrote:
>>> This is actually happening repeatedly, any idea if this is a  
>>> lustre-side error?
>>> kernel: Unable to handle NULL pointer dereference at  
>>> 0000000000000000
>>> kernel: LDISKFS-fs error (device dm-7) in  
>>> ldiskfs_reserve_inode_write: Journal has aborted
>>> kernel: Oops: 0002 [1] SMP
>>> kernel: RIP jbd:journal_commit_transaction+0xc33/0x132e

Could you please decode the line for journal_commit_transaction+0xc33  
to see what line it is.  This Oops shouldn't be happening, even if the  
journal has aborted.

Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list