[Lustre-discuss] odd kernel crash after a heartbeat failover

Kit Westneat kwestneat at ddn.com
Fri Apr 16 23:30:10 PDT 2010


This looks like bug 18235/19025... since it only occurs with flaky 
hardware, the fix was only landed in 2.0.

- Kit

On 4/16/2010 7:44 PM, Andreas Dilger wrote:
> On 2010-04-16, at 11:29, John White wrote:
>    
>> Just to follow-up, after enabling netconsole to get some meaningful
>> logging out of these OSSs, it is clear that there's a problem with
>> the backend storage communication and that this certainly isn't a
>> lustre issue.  Thanks folks.
>>
>> On Apr 15, 2010, at 9:45 PM, Cliff White wrote:
>>      
>>> John White wrote:
>>>        
>>>> This is actually happening repeatedly, any idea if this is a
>>>> lustre-side error?
>>>> kernel: Unable to handle NULL pointer dereference at
>>>> 0000000000000000
>>>> kernel: LDISKFS-fs error (device dm-7) in
>>>> ldiskfs_reserve_inode_write: Journal has aborted
>>>> kernel: Oops: 0002 [1] SMP
>>>> kernel: RIP jbd:journal_commit_transaction+0xc33/0x132e
>>>>          
> Could you please decode the line for journal_commit_transaction+0xc33
> to see what line it is.  This Oops shouldn't be happening, even if the
> journal has aborted.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer, Lustre Group
> Oracle Corporation Canada Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>    


-- 
---
Kit Westneat
kwestneat at datadirectnet.com
812-484-8485




More information about the lustre-discuss mailing list