[lustre-discuss] File locking errors.

Michael Di Domenico mdidomenico4 at gmail.com
Tue Feb 20 04:47:16 PST 2018


On Fri, Feb 16, 2018 at 11:52 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
> On 02/15/2018 06:30 PM, Patrick Farrell wrote:
>> Localflock will only provide flock between threads on the same node.  I
>> would describe it as “likely to result in data corruption unless used with
>> extreme care”.
>
> I can't agree with this enough. Someone runs a single node job and thinks
> "file locking works just fine!" and the runs a large multinode job, and then
> wonders why the output files are all messed up. I think enabling filelocking
> must be an all or nothing thing.

there are a few instances where this proves false.  think mpi job that
spawns openmp threads, but needs a local scratch file to run
out-of-core work...  also keep in mind locking across a large number
of clients imparts some performance penalty on the file system.

on a side note, in a previous email you stated your lustre version,
just a word from the wise, don't fall behind unless it's a managed
vendor solution.  its painful to update the servers, but failing
behind and then trying to update the servers is much worse.  i did it
once and have the scars to prove it... :)


More information about the lustre-discuss mailing list