[lustre-discuss] File locking errors.

Prentice Bisbal pbisbal at pppl.gov
Fri Feb 16 07:57:45 PST 2018


Colin,

This worked before, so I don't think that 2.8 itself is the problem. I 
can't just take down my clusters and upgrade a critical piece of my 
infrastructure without solid justification that it's necessary to fix 
this problem.

Prentice

On 02/15/2018 05:00 PM, Colin Faber wrote:
> If the mount on the users clients had the various options enabled, and 
> those aren't present in fstab, you'd end up with such behavior. Also 
> 2.8? Can you upgrade to 2.10 LTS??
>
>
>
> On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov 
> <mailto:pbisbal at pppl.gov>> wrote:
>
>     No. Several others have asked me the same thing, so that seems
>     like it might be the issue. The only problem with that solution is
>     that the user claimed his program worked just fine up until a
>     couple of weeks ago, so if that is the issue, I'll still be
>     scratching my head trying to figure out how/what changed
>
>
>     Prentice
>
>     On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
>>     Do you have *flock* option in fstab for lustre mount or in
>>     command you use to mount lustre on client?
>>
>>     Search for flock on lustre wiki
>>     http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
>>     <http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes>
>>     or lustre manual
>>     http://doc.lustre.org/lustre_manual.pdf
>>     <http://doc.lustre.org/lustre_manual.pdf>
>>
>>     Here are links where to start learning about lustre:
>>     * http://lustre.org/getting-started-with-lustre/
>>     <http://lustre.org/getting-started-with-lustre/>
>>     * http://wiki.lustre.org
>>     * https://wiki.hpdd.intel.com
>>     * jira.hpdd.intel.com <http://jira.hpdd.intel.com>
>>     * http://opensfs.org/lustre/
>>
>>     Alex.
>>
>>>     On Feb 15, 2018, at 11:02 AM, Prentice Bisbal <pbisbal at pppl.gov
>>>     <mailto:pbisbal at pppl.gov>> wrote:
>>>
>>>     Hi.
>>>
>>>     I'm an experience HPC system admin, but I know almost nothing
>>>     about Lustre administration. The system admin who administered
>>>     our small Lustre filesystem recently retired, and no one has
>>>     filled that gap yet. A user recently reported they are now
>>>     getting file-locking errors from a program they've run
>>>     repeatedly on Lustre in the past. When the run the same program
>>>     on an NFS filesystem, the error goes away. I've cut-and-pasted
>>>     the error messages below.
>>>
>>>     Since I have real experience as a Lustre admin, I turned to
>>>     google, and it looks like it might be that the file-locking
>>>     daemon died (if Lustre has a separate file-lock daemon), or
>>>     somehow file-locking was recently disabled. If that is possible,
>>>     how do I check this, and restart or re-enable if necessary?  I
>>>     skimmed the user manual, and could not find anything on either
>>>     of these issues.
>>>
>>>     Any and all help will be greatly appreciated.
>>>
>>>     Some of the error messages:
>>>
>>>     HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
>>>       #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>         major: File accessibilty
>>>         minor: Unable to open file
>>>       #001: H5Fint.c line 1168 in H5F_open(): unable to lock the
>>>     file or initialize file structure
>>>         major: File accessibilty
>>>         minor: Unable to open file
>>>       #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>>         major: Virtual File Layer
>>>         minor: Can't update object
>>>       #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock
>>>     file, errno = 38, error message = 'Function not implemented'
>>>         major: File accessibilty
>>>         minor: Bad file ID accessed
>>>     Error: couldn't open file HDF5-DIAG: Error detected in HDF5
>>>     (1.10.0-patch1) MPI-process 13:
>>>       #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>         major: File accessibilty
>>>         minor: Unable to open file
>>>       #001: H5Fint.c line 1168 in H5F_open(): unable to lock the
>>>     file or initialize file structure
>>>         major: File accessibilty
>>>         minor: Unable to open file
>>>       #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
>>>         major: Virtual File Layer
>>>         minor: Can't update object
>>>       #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock
>>>     file, errno = 38, error message = 'Function not implemented'
>>>         major: File accessibilty
>>>         minor: Bad file ID accessed
>>>
>>>     -- 
>>>     Prentice
>>>
>>>     _______________________________________________
>>>     lustre-discuss mailing list
>>>     lustre-discuss at lists.lustre.org
>>>     <mailto:lustre-discuss at lists.lustre.org>
>>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>
>
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180216/8b887037/attachment.html>


More information about the lustre-discuss mailing list