[Lustre-discuss] Large Corosync/Pacemaker clusters

Wed Oct 24 13:32:40 PDT 2012

FWIW, we are running HA Lustre using corosync/pacemaker.    We broke our OSSs and MDSs out into individual HA *pairs*.   Thought about other configurations but it was our first step into corosync/pacemaker so we decided to keep it as simple as possible.   Seems to work well.    I'm not sure I would attempt what you are doing though it may be perfectly fine.   When HA is a requirement, it probably makes sense to avoid pushing the limits of what works.

Doesn't really help you much other than to provide a data point with regard to what other sites are doing.   

Good luck and report back.   

Charlie Taylor
UF HPC Center

On Oct 19, 2012, at 12:52 PM, Hall, Shawn wrote:

> Hi,
>  
> We’re setting up fairly large Lustre 2.1.2 filesystems, each with 18 nodes and 159 resources all in one Corosync/Pacemaker cluster as suggested by our vendor.  We’re getting mixed messages on how large of a Corosync/Pacemaker cluster will work well between our vendor an others.
>  
> 1.       Are there Lustre Corosync/Pacemaker clusters out there of this size or larger?
> 2.       If so, what tuning needed to be done to get it to work well?
> 3.       Should we be looking more seriously into splitting this Corosync/Pacemaker cluster into pairs or sets of 4 nodes?
>  
> Right now, our current configuration takes a long time to start/stop all resources (~30-45 mins), and failing back OSTs puts a heavy load on the cib process on every node in the cluster.  Under heavy IO load, the many of the nodes will show as “unclean/offline” and many OST resources will show as inactive in crm status, despite the fact that every single MDT and OST is still mounted in the appropriate place.  We are running 2 corosync rings, each on a private 1 GbE network.  We have a bonded 10 GbE network for the LNET.
>  
> Thanks,
> Shawn
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss