[Lustre-discuss] Lustre HA Experiences

Mag Gam magawake at gmail.com
Tue May 24 16:44:01 PDT 2011


What was your conclusion? What is a good HA solution with Lustre? I am
hoping SNS will be a big push for the next year


On Wed, May 4, 2011 at 5:16 PM, Jason Rappleye <jason.rappleye at nasa.gov> wrote:
>
> On May 4, 2011, at 10:05 AM, Charles Taylor wrote:
>
>>
>> We are dipping our toes into the waters of Lustre HA using
>> pacemaker.     We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each).
>> The four OSSs are broken out into two dual-active pairs running Lustre
>> 1.8.5.    Mostly, the water is fine but we've encountered a few
>> surprises.
>>
>> 1. An 8-client  iozone write test in which we write 64 files of 1.7
>> TB  each seems to go well - until the end at which point iozone seems
>> to finish successfully and begins its "cleanup".   That is to say it
>> starts to remove all 64 large files.    At this point, the ll_ost
>> threads go bananas - consuming all available cpu cycles on all 8 cores
>> of each server.   This seems to block the corosync "totem" exchange
>> long enough to initiate a "stonith" request.
>
> Running oprofile or profile.pl (possibly only included in SGI's respin of perfsuite, original is at http://perfsuite.ncsa.illinois.edu/) is useful in situations where you have one or more thread consuming a lot of CPU. It should point to what function(s) the offending thread(s) are spending time in. From there, bugzilla/jira or the mailing list should be able to help further.
>
>> 2. We have found that re-mounting the OSTs, either via the HA agent or
>> manually, often can take a *very* long time - on the order of four or
>> five minutes.   We have not figured out why yet.   An strace of the
>> mount process has not yielded much.    The mount seems to just be
>> waiting for something but we can't tell what.
>
> Could be bz 18456.
>
> Jason
>
> --
> Jason Rappleye
> System Administrator
> NASA Advanced Supercomputing Division
> NASA Ames Research Center
> Moffett Field, CA 94035
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list