[lustre-discuss] File locking errors.
Prentice Bisbal
pbisbal at pppl.gov
Fri Feb 16 07:57:45 PST 2018
Colin,
This worked before, so I don't think that 2.8 itself is the problem. I
can't just take down my clusters and upgrade a critical piece of my
infrastructure without solid justification that it's necessary to fix
this problem.
Prentice
On 02/15/2018 05:00 PM, Colin Faber wrote:
> If the mount on the users clients had the various options enabled, and
> those aren't present in fstab, you'd end up with such behavior. Also
> 2.8? Can you upgrade to 2.10 LTS??
>
>
>
> On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov
> <mailto:pbisbal at pppl.gov>> wrote:
>
> No. Several others have asked me the same thing, so that seems
> like it might be the issue. The only problem with that solution is
> that the user claimed his program worked just fine up until a
> couple of weeks ago, so if that is the issue, I'll still be
> scratching my head trying to figure out how/what changed
>
>
> Prentice
>
> On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
>> Do you have *flock* option in fstab for lustre mount or in
>> command you use to mount lustre on client?
>>
>> Search for flock on lustre wiki
>> http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
>> <http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes>
>> or lustre manual
>> http://doc.lustre.org/lustre_manual.pdf
>> <http://doc.lustre.org/lustre_manual.pdf>
>>
>> Here are links where to start learning about lustre:
>> * http://lustre.org/getting-started-with-lustre/
>> <http://lustre.org/getting-started-with-lustre/>
>> * http://wiki.lustre.org
>> * https://wiki.hpdd.intel.com
>> * jira.hpdd.intel.com <http://jira.hpdd.intel.com>
>> * http://opensfs.org/lustre/
>>
>> Alex.
>>
>>> On Feb 15, 2018, at 11:02 AM, Prentice Bisbal <pbisbal at pppl.gov
>>> <mailto:pbisbal at pppl.gov>> wrote:
>>>
>>> Hi.
>>>
>>> I'm an experience HPC system admin, but I know almost nothing
>>> about Lustre administration. The system admin who administered
>>> our small Lustre filesystem recently retired, and no one has
>>> filled that gap yet. A user recently reported they are now
>>> getting file-locking errors from a program they've run
>>> repeatedly on Lustre in the past. When the run the same program
>>> on an NFS filesystem, the error goes away. I've cut-and-pasted
>>> the error messages below.
>>>
>>> Since I have real experience as a Lustre admin, I turned to
>>> google, and it looks like it might be that the file-locking
>>> daemon died (if Lustre has a separate file-lock daemon), or
>>> somehow file-locking was recently disabled. If that is possible,
>>> how do I check this, and restart or re-enable if necessary? I
>>> skimmed the user manual, and could not find anything on either
>>> of these issues.
>>>
>>> Any and all help will be greatly appreciated.
>>>
>>> Some of the error messages:
>>>
>>> HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
>>> #000: H5F.c line 579 in H5Fopen(): unable to open file
>>> major: File accessibilty
>>> minor: Unable to open file
>>> #001: H5Fint.c line 1168 in H5F_open(): unable to lock the
>>> file or initialize file structure
>>> major: File accessibilty
>>> minor: Unable to open file
>>> #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>> major: Virtual File Layer
>>> minor: Can't update object
>>> #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock
>>> file, errno = 38, error message = 'Function not implemented'
>>> major: File accessibilty
>>> minor: Bad file ID accessed
>>> Error: couldn't open file HDF5-DIAG: Error detected in HDF5
>>> (1.10.0-patch1) MPI-process 13:
>>> #000: H5F.c line 579 in H5Fopen(): unable to open file
>>> major: File accessibilty
>>> minor: Unable to open file
>>> #001: H5Fint.c line 1168 in H5F_open(): unable to lock the
>>> file or initialize file structure
>>> major: File accessibilty
>>> minor: Unable to open file
>>> #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>> major: Virtual File Layer
>>> minor: Can't update object
>>> #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock
>>> file, errno = 38, error message = 'Function not implemented'
>>> major: File accessibilty
>>> minor: Bad file ID accessed
>>>
>>> --
>>> Prentice
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> <mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180216/8b887037/attachment.html>
More information about the lustre-discuss
mailing list