[lustre-discuss] STOP'd processes on Lustre clients while OSS/OST unavailable?

Fri Feb 19 17:22:51 PST 2016

Hi all,

I agree regarding lustre recovery, it works just great in practice. After prolonged OSS downtime, though, you may notice jobs reaching their time limits, i.e. jobs are blocked on I/O and killed by the scheduler before they actually complete and write final results. With SLURM, for example, you could consider using scontrol suspend/resume during OSS downtime, that sends STOP/CONT signals to processes and appropriately holds job runtime.

All the best,

Stephane

> On Feb 19, 2016, at 1:22 PM, Stearman, Marc <stearman2 at llnl.gov> wrote:
> 
> I agree with Oleg.  All of our file systems are configured with OSS nodes in failover pairs, and if one node dies, lustre will run on the backup node quite well.  Occasionally, though we have to do a repair on the underlying storage, in which case we power down both OSS nodes, and do the repairs.  This usually takes less than 15 mintues, but we have had times where both nodes are down for an hour or more.  All I/O destined for those OSTs will hang until they are back online, and usually recovery completes fine and replays all the data.  This is with 4000+ clients connected to the file systems.
> 
> Note that any clients that reboot or crash while those OSTs are offline will not be recoverable, but any clients that stay up through the entire repair window should pause and then recover once the hardware has been fixed.  You should not have to kill or STOP any processes using the file system.
> 
> -Marc
> 
> ----
> D. Marc Stearman
> Lustre Operations Lead
> stearman2 at llnl.gov
> Office:  925-423-9670
> Mobile:  925-216-7516
> 
> 
> 
> 
>> On Feb 19, 2016, at 12:11 PM, Drokin, Oleg <oleg.drokin at intel.com> wrote:
>> 
>> Hello!
>> 
>>  Actually I have to disagree.
>>  If the servers go down, but then go up and complete the recovery succesfully, the locks would be replayed and it all should work transparently.
>>  Clients would 'pause" trying to access those servers for as long as needed until the servers come back again.
>> 
>>  Also, file descriptors is something between MDS and clients so if an OST goes down, file descriptors would not be affected.
>> 
>>  That said, leaving MDS up while some OSTs go down for potentially prolonged time is not that great of an idea and it might make sense to deactivate those OSTs on MDS (before bringing OSTs down)
>>  (and reactivate them once they are back).
>> 
>> Bye,
>>   Oleg
>> On Feb 19, 2016, at 2:53 PM, Patrick Farrell wrote:
>> 
>>> Paul,
>>> 
>>> I would say this is not very likely to work and could easily result in corrupted data.  With the servers going down completely, the clients will lose the locks they had (no possibility of recovery with the servers down completely like this), and any data not written out will be lost.  You can guarantee the processes are idle with SIGSTOP, yes, but you can't guarantee all of the data has been written out.
>>> 
>>> There are other possible issues as well, but I don't think it's necessary to detail them all.  I would strongly advise against this plan - Just truly stop activity on the clients and unmount Lustre (to be certain), then remount it after the maintenance is complete.
>>> 
>>> - Patrick
>>> On 02/19/2016 01:45 PM, Paul Brunk wrote:
>>>> Hi all:
>>>> 
>>>> We have a Linux cluster (CentOS 6.5, Lustre 1.8.9-wcl) which mounts a
>>>> Lustre FS from CentOS-based server appliance (Lustre 2.1.0).
>>>> 
>>>> The Lustre cluster has 4 OSSes as two failover pairs. Due to bad luck
>>>> we have one OSS unbootable, and replacing it will require taking its
>>>> live partner down too (though not any of the other Lustre servers).
>>>> 
>>>> We can prevent I/O to the Lustre FS by suspending (kill -STOP) the
>>>> user processes on the cluster compute nodes before the maintenance
>>>> work, and resuming them (kill -CONT) afterwards.
>>>> 
>>>> I don't know what would happen, though, in those cases where the
>>>> STOP'd process has an open file decriptor on the Lustre FS. If the
>>>> relevant OSS/OSTs become unavailable, and then available again, during
>>>> the STOP'd time, what would happen when the process is CONT'd?
>>>> 
>>>> I tried a Web search on this, but the best I could find was stuff
>>>> which assumed that one of a failover partner set would remain
>>>> available. or was specifially about evictions (which I guess are a
>>>> risk of this maintenance prccedure anyway). I did find one doc (
>>>> http://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_for_Resiliency 
>>>> )which suggested that silent data corruption was a possibility in the
>>>> event of evictions.
>>>> 
>>>> But what about non-evicted clients with open filehandles?
>>>> 
>>>> Thanks for any insight!
>>>> 
>>> 
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org