[lustre-discuss] File locking errors.

Arman Khalatyan arm2arm at gmail.com
Thu Feb 15 15:38:39 PST 2018


ok, you are right, localflock might be a problem on parallel access, but at
least our code is started to work after that.just for information  the
thread from hdf is following:
https://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2016-May/009483.html

Am 16.02.2018 12:30 vorm. schrieb "Patrick Farrell" <paf at cray.com>:

>
>
> Localflock will only provide flock between threads on the same node.  I
> would describe it as “likely to result in data corruption unless used with
> extreme care”.
>
> Are you sure HDF only ever uses flocks between threads on the same node?
> That seems extremely unlikely or maybe impossible for HDF.  You should
> definitely use flock, which gets flocks working across nodes, and is
> supported with all vaguely recent versions of Lustre.
>
> ________________________________________
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf
> of Arman Khalatyan <arm2arm at gmail.com>
> Sent: Thursday, February 15, 2018 5:19:14 PM
> To: E.S. Rosenberg
> Cc: Alexander I Kulyavtsev; Lustre discussion
> Subject: Re: [lustre-discuss] File locking errors.
>
> we had similar troubles with hdf1.10 vs hdf1.8.x. on the lustre
> the new hdf require flock support from the underlying filesystem( due to
> the security reasons or whatever more info on hdf you can digg in hdf
> forums)
> to fix the mounts you should unmount an mount again with the option
> localflock, this works for us, independent on lustre version.
> that what we did:
>
> https://arm2armcos.blogspot.de/2018/02/hdf5-v110-or-above-
> on-lustre-fs.html?m=1
>
>
>
>
> Am 15.02.2018 11:18 nachm. schrieb "E.S. Rosenberg" <
> esr+lustre at mail.hebrew.edu<mailto:esr%2Blustre at mail.hebrew.edu>>:
>
>
> On Fri, Feb 16, 2018 at 12:00 AM, Colin Faber <cfaber at gmail.com<mailto:
> cfaber at gmail.com>> wrote:
> If the mount on the users clients had the various options enabled, and
> those aren't present in fstab, you'd end up with such behavior. Also 2.8?
> Can you upgrade to 2.10 LTS??
> Depending on when they installed their system that may not be such a
> 'small' change, our 2.8 is running on CentOS 6.8 so an upgrade to 2.10
> requires us to also upgrade the OS from 6.x to 7.x and though I very much
> want to do that that is a more intensive process that so far I have not had
> the time for and I can imagine others have the same issue.
> Regards,
> Eli
>
>
>
> On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov<mailto:pbisb
> al at pppl.gov>> wrote:
>
> No. Several others have asked me the same thing, so that seems like it
> might be the issue. The only problem with that solution is that the user
> claimed his program worked just fine up until a couple of weeks ago, so if
> that is the issue, I'll still be scratching my head trying to figure out
> how/what changed
>
>
> Prentice
>
> On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
> Do you have flock option in fstab for lustre mount or in command you use
> to mount lustre on client?
>
> Search for flock on lustre wiki
> http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
> or lustre manual
> http://doc.lustre.org/lustre_manual.pdf
>
> Here are links where to start learning about lustre:
> * http://lustre.org/getting-started-with-lustre/
> * http://wiki.lustre.org
> * https://wiki.hpdd.intel.com
> * jira.hpdd.intel.com<http://jira.hpdd.intel.com>
> * http://opensfs.org/lustre/
>
> Alex.
>
> On Feb 15, 2018, at 11:02 AM, Prentice Bisbal <pbisbal at pppl.gov<mailto:
> pbisbal at pppl.gov>> wrote:
>
> Hi.
>
> I'm an experience HPC system admin, but I know almost nothing about Lustre
> administration. The system admin who administered our small Lustre
> filesystem recently retired, and no one has filled that gap yet. A user
> recently reported they are now getting file-locking errors from a program
> they've run repeatedly on Lustre in the past. When the run the same program
> on an NFS filesystem, the error goes away. I've cut-and-pasted the error
> messages below.
>
> Since I have real experience as a Lustre admin, I turned to google, and it
> looks like it might be that the file-locking daemon died (if Lustre has a
> separate file-lock daemon), or somehow file-locking was recently disabled.
> If that is possible, how do I check this, and restart or re-enable if
> necessary?  I skimmed the user manual, and could not find anything on
> either of these issues.
>
> Any and all help will be greatly appreciated.
>
> Some of the error messages:
>
> HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
>   #000: H5F.c line 579 in H5Fopen(): unable to open file
>     major: File accessibilty
>     minor: Unable to open file
>   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or
> initialize file structure
>     major: File accessibilty
>     minor: Unable to open file
>   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>     major: Virtual File Layer
>     minor: Can't update object
>   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file,
> errno = 38, error message = 'Function not implemented'
>     major: File accessibilty
>     minor: Bad file ID accessed
> Error: couldn't open file HDF5-DIAG: Error detected in HDF5
> (1.10.0-patch1) MPI-process 13:
>   #000: H5F.c line 579 in H5Fopen(): unable to open file
>     major: File accessibilty
>     minor: Unable to open file
>   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or
> initialize file structure
>     major: File accessibilty
>     minor: Unable to open file
>   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>     major: Virtual File Layer
>     minor: Can't update object
>   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file,
> errno = 38, error message = 'Function not implemented'
>     major: File accessibilty
>     minor: Bad file ID accessed
>
> --
> Prentice
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180216/f9922248/attachment.html>


More information about the lustre-discuss mailing list