[Lustre-discuss] aacraid kernel panic caused failover

David Noriega tsk133 at my.utsa.edu
Tue Apr 5 22:03:30 PDT 2011


Ok I updated the aacraid driver and the raid firmware, yet I still had
the problem happen, so I did more research and applied the following
tweaks:

1) Rebuilt mkinitrd with the following options:
a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
2) Added the local hard disk to the multipath black list
3) Edited modprobe.conf to have the following aacraid options:
options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
seem to print anything to dmesg
4) Added pcie_aspm=off to the kernel boot options

So things looked good for a while. I did have a problem mounting the
lustre partitions but this was my fault in misconfiguring some lnet
options I was experimenting with. I fixed that and just as a test, I
ran 'modprobe lustre' since I wasn't ready to fail back the partitions
just yet(wanted to wait till when activity was the lowest). That was
earlier today. I was about to fail back tonight, yet when I checked
the server again I saw in dmesg the same aacraid problems from before.
Is it possible lustre is interfering with aacraid? Its weird since I
do have a duplicate machine and its not having any of thise problems.

On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jason <jtemple at cscs.ch> wrote:
> Adaptec should have the firmware and drivers on their site for your card.  If not adaptec, then SOracle will have it available somewhere.
>
> The firmware and system drivers usually have a utility that will check the current version and upgrade it for you.
>
> Hope this helps (I use different cards, so I can't tell you exactly).
>
> -Jason
>
> -----Original Message-----
> From: David Noriega [mailto:tsk133 at my.utsa.edu]
> Sent: venerdì, 25. marzo 2011 15:47
> To: Temple Jason
> Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover
>
> Hmm not sure, whats the best way to find out?
>
> On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jason <jtemple at cscs.ch> wrote:
>> Hi,
>>
>> Are you using the latest firmware?  This sort of thing used to happen to me, but with different raid cards.
>>
>> -Jason
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of David Noriega
>> Sent: venerdì, 25. marzo 2011 15:38
>> To: lustre-discuss at lists.lustre.org
>> Subject: [Lustre-discuss] aacraid kernel panic caused failover
>>
>> Had some crazyness happen to our lustre system. We have two OSSs, both
>> identical sun x4140 servers and on only one of them have I've seen
>> this pop up in the kernel messages and then a kernel panic. The panic
>> seemed to then spread and caused the network to go down and the second
>> OSS to try to failover(or failback?). Anyways 'splitbrain' occurred
>> and I was able to get in and set them straight. I researched this
>> aacraid module messages and so far all I can find says to increase the
>> timeout, but these are old messages and currently they are set to 60.
>> Anyone else have any ideas?
>>
>> aacraid: Host adapter abort request (0,0,0,0)
>> aacraid: Host adapter reset request. SCSI hang ?
>> AAC: Host adapter BLINK LED 0xef
>> AAC0: adapter kernel panic'd ef.
>>
>> --
>> Personally, I liked the university. They gave us money and facilities,
>> we didn't have to produce anything! You've never been out of college!
>> You don't know what it's like out there! I've worked in the private
>> sector. They expect results. -Ray Ghostbusters
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn't have to produce anything! You've never been out of college!
> You don't know what it's like out there! I've worked in the private
> sector. They expect results. -Ray Ghostbusters
>



-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters



More information about the lustre-discuss mailing list