[lustre-discuss] File locking errors.
pbisbal at pppl.gov
Tue Feb 20 07:00:13 PST 2018
On 02/20/2018 08:58 AM, Patrick Farrell wrote:
> There is almost NO overhead to this locking unless you’re using it to keep threads away from each other’s on multiple nodes, in which case the time is spent doing waiting your app is asking for.
> Lustre is doing implicit metadata and data locking constantly throughout normal operation, all active clients at all times, this is just a little more locking, of an explicit kind. It should be almost impossible to measure a cost to flock vs localflock unless you’re really going wild with your file locking, in which case it would be worth your while to modify your job to use a better kind of concurrency control, because even localflock is slow compared to MPI communication.
So, just to be clear, you are saying that file locking, whether local or
global, creates little overhead/performance penalty, except for the case
when an application is actively using global filelocking, and then it's
only because that app is waiting for the lock to be freed?
Put another way enabling local or global file-locking will not affect
the overall performance of Lustre. It will only affect the apps are
actually calling flock, and only when there is contention for a lock,
which is unavoidable, and the whole point of file-locking in the first
place. Is that correct?
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Michael Di Domenico <mdidomenico4 at gmail.com>
> Sent: Tuesday, February 20, 2018 6:47:16 AM
> To: Prentice Bisbal
> Cc: lustre-discuss
> Subject: Re: [lustre-discuss] File locking errors.
> On Fri, Feb 16, 2018 at 11:52 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>> On 02/15/2018 06:30 PM, Patrick Farrell wrote:
>>> Localflock will only provide flock between threads on the same node. I
>>> would describe it as “likely to result in data corruption unless used with
>>> extreme care”.
>> I can't agree with this enough. Someone runs a single node job and thinks
>> "file locking works just fine!" and the runs a large multinode job, and then
>> wonders why the output files are all messed up. I think enabling filelocking
>> must be an all or nothing thing.
> there are a few instances where this proves false. think mpi job that
> spawns openmp threads, but needs a local scratch file to run
> out-of-core work... also keep in mind locking across a large number
> of clients imparts some performance penalty on the file system.
> on a side note, in a previous email you stated your lustre version,
> just a word from the wise, don't fall behind unless it's a managed
> vendor solution. its painful to update the servers, but failing
> behind and then trying to update the servers is much worse. i did it
> once and have the scars to prove it... :)
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss