[lustre-discuss] flock vs localflock

Tue Jul 10 01:41:47 PDT 2018

Hi Darby,

On Thu, Jul 05, 2018 at 09:26:36PM +0000, Vicker, Darby (JSC-EG311) wrote:
>Also, the ldlm processes lead us to looking at flock vs localflock.  On previous generations of our LFS???s, we used localflock.  But on the current LFS, we decided to try flock instead.  This LFS has been in production for a couple years with no obvious problems due to flock but we decided to drop back to localflock as a precaution for now.  We need to do a more controlled test but this does seem to help.  What are other sites using for locking parameters?

we use flock for /home and the large scratch filesystem. have done for
probably 10 years. localflock for the read-only software installs in
/apps, and no locking for the OS image (overlayfs with ramdisk upper,
read-only Lustre lower).

we are all ZFS and 2.10.4 too.

I don't think we have much in the way of flock user codes, so I can't
actually recall any issues along those lines.

the most common MDS abusing load we see is jobs across multiple nodes
appending to the same (by definition rubbish) output file. the write
lock bounces between nodes and causes high MDS load, poor performance
for those clients nodes, bit slower for everyone. I look for these
simply with 'lsof' and correlate across nodes.

HTH

cheers,
robin