[Lustre-discuss] Lustre HA Experiences

Wed May 4 13:39:38 PDT 2011

We're investigating Pacemaker HA setup here too, so I'm interested in
your findings, and I hope I can help a little here.

1. So it seems like totem is not responding on some, but still running
on others, if they take the initiative to stonith.

I would investigate bumping up or adding some parameters in the
corosync.conf

Check out token, token_restransmit, and
token_retransmits_before_loss_const (among others, not sure what the
complete answer for this is), they may help get you past spikes in load.

2. This sounds like normal OST recovery. It is taking that time to
return the OSTs to a consistent state.
Check out:
http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html

The Pacemaker page on wiki.lustre.org shows you how to deal with this:
http://wiki.lustre.org/index.php/Using_Pacemaker_with_Lustre

The op start and op stop timeouts should be set to 300 to allow for the
recovery process to complete.

It would be helpful to see your resource configuration file, as well as
your corosync.conf.

Justin Miller         (812) 855-2719        jupmille at iu.edu
Indiana University - Research Technologies - Data Capacitor

On 5/4/11 1:05 PM, Charles Taylor wrote:
> 
> We are dipping our toes into the waters of Lustre HA using  
> pacemaker.     We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each).    
> The four OSSs are broken out into two dual-active pairs running Lustre  
> 1.8.5.    Mostly, the water is fine but we've encountered a few  
> surprises.
> 
> 1. An 8-client  iozone write test in which we write 64 files of 1.7  
> TB  each seems to go well - until the end at which point iozone seems  
> to finish successfully and begins its "cleanup".   That is to say it  
> starts to remove all 64 large files.    At this point, the ll_ost   
> threads go bananas - consuming all available cpu cycles on all 8 cores  
> of each server.   This seems to block the corosync "totem" exchange  
> long enough to initiate a "stonith" request.
> 
> 2. We have found that re-mounting the OSTs, either via the HA agent or  
> manually, often can take a *very* long time - on the order of four or  
> five minutes.   We have not figured out why yet.   An strace of the  
> mount process has not yielded much.    The mount seems to just be  
> waiting for something but we can't tell what.
> 
> We are starting to adjust our HA parameters to compensate for these  
> observations but we hate to do this in a vacuum and wonder if others  
> have also observed these behaviors and what, if anything, was done to  
> compensate/correct?
> 
> Regards,
> 
> Charlie Taylor
> UF HPC Center
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss