[Lustre-discuss] aacraid kernel panic caused failover

David Noriega tsk133 at my.utsa.edu
Fri Mar 25 07:37:44 PDT 2011


Had some crazyness happen to our lustre system. We have two OSSs, both
identical sun x4140 servers and on only one of them have I've seen
this pop up in the kernel messages and then a kernel panic. The panic
seemed to then spread and caused the network to go down and the second
OSS to try to failover(or failback?). Anyways 'splitbrain' occurred
and I was able to get in and set them straight. I researched this
aacraid module messages and so far all I can find says to increase the
timeout, but these are old messages and currently they are set to 60.
Anyone else have any ideas?

aacraid: Host adapter abort request (0,0,0,0)
aacraid: Host adapter reset request. SCSI hang ?
AAC: Host adapter BLINK LED 0xef
AAC0: adapter kernel panic'd ef.

-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters



More information about the lustre-discuss mailing list