[Lustre-discuss] Lustre HA Experiences

Wed May 4 10:05:41 PDT 2011

We are dipping our toes into the waters of Lustre HA using  
pacemaker.     We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each).    
The four OSSs are broken out into two dual-active pairs running Lustre  
1.8.5.    Mostly, the water is fine but we've encountered a few  
surprises.

1. An 8-client  iozone write test in which we write 64 files of 1.7  
TB  each seems to go well - until the end at which point iozone seems  
to finish successfully and begins its "cleanup".   That is to say it  
starts to remove all 64 large files.    At this point, the ll_ost   
threads go bananas - consuming all available cpu cycles on all 8 cores  
of each server.   This seems to block the corosync "totem" exchange  
long enough to initiate a "stonith" request.

2. We have found that re-mounting the OSTs, either via the HA agent or  
manually, often can take a *very* long time - on the order of four or  
five minutes.   We have not figured out why yet.   An strace of the  
mount process has not yielded much.    The mount seems to just be  
waiting for something but we can't tell what.

We are starting to adjust our HA parameters to compensate for these  
observations but we hate to do this in a vacuum and wonder if others  
have also observed these behaviors and what, if anything, was done to  
compensate/correct?

Regards,

Charlie Taylor
UF HPC Center