[lustre-discuss] File locking errors.
Michael Di Domenico
mdidomenico4 at gmail.com
Tue Feb 20 04:47:16 PST 2018
On Fri, Feb 16, 2018 at 11:52 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
> On 02/15/2018 06:30 PM, Patrick Farrell wrote:
>> Localflock will only provide flock between threads on the same node. I
>> would describe it as “likely to result in data corruption unless used with
>> extreme care”.
>
> I can't agree with this enough. Someone runs a single node job and thinks
> "file locking works just fine!" and the runs a large multinode job, and then
> wonders why the output files are all messed up. I think enabling filelocking
> must be an all or nothing thing.
there are a few instances where this proves false. think mpi job that
spawns openmp threads, but needs a local scratch file to run
out-of-core work... also keep in mind locking across a large number
of clients imparts some performance penalty on the file system.
on a side note, in a previous email you stated your lustre version,
just a word from the wise, don't fall behind unless it's a managed
vendor solution. its painful to update the servers, but failing
behind and then trying to update the servers is much worse. i did it
once and have the scars to prove it... :)
More information about the lustre-discuss
mailing list