[Lustre-discuss] lustre 1.6.5.1 panic on failover

Brock Palen brockp at umich.edu
Fri Aug 1 08:39:43 PDT 2008


yes it is consistant.  I looked up how to induce a panic using sysrq

echo c > /proc/sysreq-trigger

That will work right, the machine cycles the second takes over and  
all is well.

If instead of crashing the node I run 'killall -9 heartbeat'
I can get the panic every time.  I even edited the external/ipmi  
script from 'power reset' to 'power cycle' didn't help.

Its kinda unstable, if heartbeat dies the who MDS/mgs server setup  
would lock up, if the server panics I will be ok.  I don't like this  
spot.

I am looking at grabbing a crash dump. I think its a race, heartbeat  
is mounting the filesystems before the first node is toatally dead.

Does it hurt to run mmp on the mgs file system also?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jul 31, 2008, at 5:28 PM, Klaus Steden wrote:
>
> Hi Brock,
>
> I've been using Sun X2200s with Lustre in a similar configuration  
> (IPMI,
> STONITH, Linux-HA, FC storage) and haven't had any issues like this
> (although I would typically panic the primary node during testing  
> using
> Sysrq) ... is the behaviour consistent?
>
> Klaus
>
> On 7/31/08 1:57 PM, "Brock Palen" <brockp at umich.edu>did etch on stone
> tablets:
>
>> I have two machines I am setting up as my first mds failover pair.
>>
>> The two sun x4100's  are connected to a FC disk array.  I have set up
>> heartbeat with IPMI for STONITH.
>>
>> Problem is when I run a test on the host that currently has the mds/
>> mgs mounted  'killall -9 heartbeat'  I see the IPMI shutdown and when
>> the second 4100 tries to mount the filesystem it does a kernel panic.
>>
>> Has anyone else seen this behavior?  Is there something I am running
>> into?  If I do a 'hb_takelover' or shutdown heartbeat cleanly all is
>> well.  Only if I simulate heartbeat failing does this happen.  Note I
>> have not tired yanking power yet, but I want to simulate a MDS in a
>> semi dead state and ran into this.
>>
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>




More information about the lustre-discuss mailing list