[lustre-discuss] File locking errors.

Patrick Farrell paf at cray.com
Thu Feb 15 15:30:51 PST 2018



Localflock will only provide flock between threads on the same node.  I would describe it as “likely to result in data corruption unless used with extreme care”.

Are you sure HDF only ever uses flocks between threads on the same node?  That seems extremely unlikely or maybe impossible for HDF.  You should definitely use flock, which gets flocks working across nodes, and is supported with all vaguely recent versions of Lustre.

________________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Arman Khalatyan <arm2arm at gmail.com>
Sent: Thursday, February 15, 2018 5:19:14 PM
To: E.S. Rosenberg
Cc: Alexander I Kulyavtsev; Lustre discussion
Subject: Re: [lustre-discuss] File locking errors.

we had similar troubles with hdf1.10 vs hdf1.8.x. on the lustre
the new hdf require flock support from the underlying filesystem( due to the security reasons or whatever more info on hdf you can digg in hdf forums)
to fix the mounts you should unmount an mount again with the option localflock, this works for us, independent on lustre version.
that what we did:

https://arm2armcos.blogspot.de/2018/02/hdf5-v110-or-above-on-lustre-fs.html?m=1




Am 15.02.2018 11:18 nachm. schrieb "E.S. Rosenberg" <esr+lustre at mail.hebrew.edu<mailto:esr%2Blustre at mail.hebrew.edu>>:


On Fri, Feb 16, 2018 at 12:00 AM, Colin Faber <cfaber at gmail.com<mailto:cfaber at gmail.com>> wrote:
If the mount on the users clients had the various options enabled, and those aren't present in fstab, you'd end up with such behavior. Also 2.8? Can you upgrade to 2.10 LTS??
Depending on when they installed their system that may not be such a 'small' change, our 2.8 is running on CentOS 6.8 so an upgrade to 2.10 requires us to also upgrade the OS from 6.x to 7.x and though I very much want to do that that is a more intensive process that so far I have not had the time for and I can imagine others have the same issue.
Regards,
Eli



On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov<mailto:pbisbal at pppl.gov>> wrote:

No. Several others have asked me the same thing, so that seems like it might be the issue. The only problem with that solution is that the user claimed his program worked just fine up until a couple of weeks ago, so if that is the issue, I'll still be scratching my head trying to figure out how/what changed


Prentice

On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
Do you have flock option in fstab for lustre mount or in command you use to mount lustre on client?

Search for flock on lustre wiki
http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
or lustre manual
http://doc.lustre.org/lustre_manual.pdf

Here are links where to start learning about lustre:
* http://lustre.org/getting-started-with-lustre/
* http://wiki.lustre.org
* https://wiki.hpdd.intel.com
* jira.hpdd.intel.com<http://jira.hpdd.intel.com>
* http://opensfs.org/lustre/

Alex.

On Feb 15, 2018, at 11:02 AM, Prentice Bisbal <pbisbal at pppl.gov<mailto:pbisbal at pppl.gov>> wrote:

Hi.

I'm an experience HPC system admin, but I know almost nothing about Lustre administration. The system admin who administered our small Lustre filesystem recently retired, and no one has filled that gap yet. A user recently reported they are now getting file-locking errors from a program they've run repeatedly on Lustre in the past. When the run the same program on an NFS filesystem, the error goes away. I've cut-and-pasted the error messages below.

Since I have real experience as a Lustre admin, I turned to google, and it looks like it might be that the file-locking daemon died (if Lustre has a separate file-lock daemon), or somehow file-locking was recently disabled. If that is possible, how do I check this, and restart or re-enable if necessary?  I skimmed the user manual, and could not find anything on either of these issues.

Any and all help will be greatly appreciated.

Some of the error messages:

HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
Error: couldn't open file HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 13:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed

--
Prentice

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list