[lustre-discuss] File locking errors.

Arman Khalatyan arm2arm at gmail.com
Thu Feb 15 15:19:14 PST 2018


we had similar troubles with hdf1.10 vs hdf1.8.x. on the lustre
the new hdf require flock support from the underlying filesystem( due to
the security reasons or whatever more info on hdf you can digg in hdf
forums)
to fix the mounts you should unmount an mount again with the option
localflock, this works for us, independent on lustre version.
that what we did:

https://arm2armcos.blogspot.de/2018/02/hdf5-v110-or-above-on-lustre-fs.html?m=1




Am 15.02.2018 11:18 nachm. schrieb "E.S. Rosenberg" <
esr+lustre at mail.hebrew.edu>:

>
>
> On Fri, Feb 16, 2018 at 12:00 AM, Colin Faber <cfaber at gmail.com> wrote:
>
>> If the mount on the users clients had the various options enabled, and
>> those aren't present in fstab, you'd end up with such behavior. Also 2.8?
>> Can you upgrade to 2.10 LTS??
>>
> Depending on when they installed their system that may not be such a
> 'small' change, our 2.8 is running on CentOS 6.8 so an upgrade to 2.10
> requires us to also upgrade the OS from 6.x to 7.x and though I very much
> want to do that that is a more intensive process that so far I have not had
> the time for and I can imagine others have the same issue.
> Regards,
> Eli
>
>>
>>
>>
>> On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov> wrote:
>>
>>> No. Several others have asked me the same thing, so that seems like it
>>> might be the issue. The only problem with that solution is that the user
>>> claimed his program worked just fine up until a couple of weeks ago, so if
>>> that is the issue, I'll still be scratching my head trying to figure out
>>> how/what changed
>>>
>>>
>>> Prentice
>>>
>>> On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
>>>
>>> Do you have *flock* option in fstab for lustre mount or in command you
>>> use to mount lustre on client?
>>>
>>> Search for flock on lustre wiki
>>> http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
>>> or lustre manual
>>> http://doc.lustre.org/lustre_manual.pdf
>>>
>>> Here are links where to start learning about lustre:
>>> * http://lustre.org/getting-started-with-lustre/
>>> * http://wiki.lustre.org
>>> * https://wiki.hpdd.intel.com
>>> * jira.hpdd.intel.com
>>> * http://opensfs.org/lustre/
>>>
>>> Alex.
>>>
>>> On Feb 15, 2018, at 11:02 AM, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>>
>>> Hi.
>>>
>>> I'm an experience HPC system admin, but I know almost nothing about
>>> Lustre administration. The system admin who administered our small Lustre
>>> filesystem recently retired, and no one has filled that gap yet. A user
>>> recently reported they are now getting file-locking errors from a program
>>> they've run repeatedly on Lustre in the past. When the run the same program
>>> on an NFS filesystem, the error goes away. I've cut-and-pasted the error
>>> messages below.
>>>
>>> Since I have real experience as a Lustre admin, I turned to google, and
>>> it looks like it might be that the file-locking daemon died (if Lustre has
>>> a separate file-lock daemon), or somehow file-locking was recently
>>> disabled. If that is possible, how do I check this, and restart or
>>> re-enable if necessary?  I skimmed the user manual, and could not find
>>> anything on either of these issues.
>>>
>>> Any and all help will be greatly appreciated.
>>>
>>> Some of the error messages:
>>>
>>> HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
>>>   #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>     major: File accessibilty
>>>     minor: Unable to open file
>>>   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or
>>> initialize file structure
>>>     major: File accessibilty
>>>     minor: Unable to open file
>>>   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>>     major: Virtual File Layer
>>>     minor: Can't update object
>>>   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file,
>>> errno = 38, error message = 'Function not implemented'
>>>     major: File accessibilty
>>>     minor: Bad file ID accessed
>>> Error: couldn't open file HDF5-DIAG: Error detected in HDF5
>>> (1.10.0-patch1) MPI-process 13:
>>>   #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>     major: File accessibilty
>>>     minor: Unable to open file
>>>   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or
>>> initialize file structure
>>>     major: File accessibilty
>>>     minor: Unable to open file
>>>   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>>     major: Virtual File Layer
>>>     minor: Can't update object
>>>   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file,
>>> errno = 38, error message = 'Function not implemented'
>>>     major: File accessibilty
>>>     minor: Bad file ID accessed
>>>
>>> --
>>> Prentice
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180216/a7bf7ddb/attachment.html>


More information about the lustre-discuss mailing list