[lustre-discuss] File locking errors.

Prentice Bisbal pbisbal at pppl.gov
Fri Feb 16 08:13:19 PST 2018


Boom. I think you just nailed it. The user confirmed that he uses this 
code almost daily. We use environment modules here, so I checked to see 
if it was possible that this error was occuring because he loaded the 
hdf5/1.10p1 module instead of hdf5/1.8 this time. Checking the 
executable with ldd and a clean environment (no modules loaded), shows 
that the program is dynamically linked, but RPATH is set to use the HDF5 
1.10 libraries Looking at the mtime of the file, it looks like the 
executable was rebuilt on 1/23.

I haven't confirmed it with the user yet, but most likely the version 
before 1/23 was built with HDF5 1.8, then it was rebuilt on 1/23 with 
HDF5 1.10p1, which caused this changed behavior.

Thanks so much for suggesting this could be the problem!

Prentice

On 02/15/2018 06:19 PM, Arman Khalatyan wrote:
> we had similar troubles with hdf1.10 vs hdf1.8.x. on the lustre
> the new hdf require flock support from the underlying filesystem( due 
> to the security reasons or whatever more info on hdf you can digg in 
> hdf forums)
> to fix the mounts you should unmount an mount again with the option 
> localflock, this works for us, independent on lustre version.
> that what we did:
>
> https://arm2armcos.blogspot.de/2018/02/hdf5-v110-or-above-on-lustre-fs.html?m=1
>
>
>
>
> Am 15.02.2018 11:18 nachm. schrieb "E.S. Rosenberg" 
> <esr+lustre at mail.hebrew.edu <mailto:esr%2Blustre at mail.hebrew.edu>>:
>
>
>
>     On Fri, Feb 16, 2018 at 12:00 AM, Colin Faber <cfaber at gmail.com
>     <mailto:cfaber at gmail.com>> wrote:
>
>         If the mount on the users clients had the various options
>         enabled, and those aren't present in fstab, you'd end up with
>         such behavior. Also 2.8? Can you upgrade to 2.10 LTS??
>
>     Depending on when they installed their system that may not be such
>     a 'small' change, our 2.8 is running on CentOS 6.8 so an upgrade
>     to 2.10 requires us to also upgrade the OS from 6.x to 7.x and
>     though I very much want to do that that is a more intensive
>     process that so far I have not had the time for and I can imagine
>     others have the same issue.
>     Regards,
>     Eli
>
>
>
>
>         On Feb 15, 2018 1:06 PM, "Prentice Bisbal" <pbisbal at pppl.gov
>         <mailto:pbisbal at pppl.gov>> wrote:
>
>             No. Several others have asked me the same thing, so that
>             seems like it might be the issue. The only problem with
>             that solution is that the user claimed his program worked
>             just fine up until a couple of weeks ago, so if that is
>             the issue, I'll still be scratching my head trying to
>             figure out how/what changed
>
>
>             Prentice
>
>             On 02/15/2018 12:31 PM, Alexander I Kulyavtsev wrote:
>>             Do you have *flock* option in fstab for lustre mount or
>>             in command you use to mount lustre on client?
>>
>>             Search for flock on lustre wiki
>>             http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
>>             <http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes>
>>             or lustre manual
>>             http://doc.lustre.org/lustre_manual.pdf
>>             <http://doc.lustre.org/lustre_manual.pdf>
>>
>>             Here are links where to start learning about lustre:
>>             * http://lustre.org/getting-started-with-lustre/
>>             <http://lustre.org/getting-started-with-lustre/>
>>             * http://wiki.lustre.org
>>             * https://wiki.hpdd.intel.com
>>             * jira.hpdd.intel.com <http://jira.hpdd.intel.com>
>>             * http://opensfs.org/lustre/
>>
>>             Alex.
>>
>>>             On Feb 15, 2018, at 11:02 AM, Prentice Bisbal
>>>             <pbisbal at pppl.gov <mailto:pbisbal at pppl.gov>> wrote:
>>>
>>>             Hi.
>>>
>>>             I'm an experience HPC system admin, but I know almost
>>>             nothing about Lustre administration. The system admin
>>>             who administered our small Lustre filesystem recently
>>>             retired, and no one has filled that gap yet. A user
>>>             recently reported they are now getting file-locking
>>>             errors from a program they've run repeatedly on Lustre
>>>             in the past. When the run the same program on an NFS
>>>             filesystem, the error goes away. I've cut-and-pasted the
>>>             error messages below.
>>>
>>>             Since I have real experience as a Lustre admin, I turned
>>>             to google, and it looks like it might be that the
>>>             file-locking daemon died (if Lustre has a separate
>>>             file-lock daemon), or somehow file-locking was recently
>>>             disabled. If that is possible, how do I check this, and
>>>             restart or re-enable if necessary?  I skimmed the user
>>>             manual, and could not find anything on either of these
>>>             issues.
>>>
>>>             Any and all help will be greatly appreciated.
>>>
>>>             Some of the error messages:
>>>
>>>             HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1)
>>>             MPI-process 9:
>>>               #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>                 major: File accessibilty
>>>                 minor: Unable to open file
>>>               #001: H5Fint.c line 1168 in H5F_open(): unable to lock
>>>             the file or initialize file structure
>>>                 major: File accessibilty
>>>                 minor: Unable to open file
>>>               #002: H5FD.c line 1821 in H5FD_lock(): driver lock
>>>             request failed
>>>                 major: Virtual File Layer
>>>                 minor: Can't update object
>>>               #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable
>>>             to flock file, errno = 38, error message = 'Function not
>>>             implemented'
>>>                 major: File accessibilty
>>>                 minor: Bad file ID accessed
>>>             Error: couldn't open file HDF5-DIAG: Error detected in
>>>             HDF5 (1.10.0-patch1) MPI-process 13:
>>>               #000: H5F.c line 579 in H5Fopen(): unable to open file
>>>                 major: File accessibilty
>>>                 minor: Unable to open file
>>>               #001: H5Fint.c line 1168 in H5F_open(): unable to lock
>>>             the file or initialize file structure
>>>                 major: File accessibilty
>>>                 minor: Unable to open file
>>>               #002: H5FD.c line 1821 in H5FD_lock(): driver lock
>>>             request failed
>>>                 major: Virtual File Layer
>>>                 minor: Can't update object
>>>               #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable
>>>             to flock file, errno = 38, error message = 'Function not
>>>             implemented'
>>>                 major: File accessibilty
>>>                 minor: Bad file ID accessed
>>>
>>>             -- 
>>>             Prentice
>>>
>>>             _______________________________________________
>>>             lustre-discuss mailing list
>>>             lustre-discuss at lists.lustre.org
>>>             <mailto:lustre-discuss at lists.lustre.org>
>>>             http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>             <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>
>
>             _______________________________________________
>             lustre-discuss mailing list
>             lustre-discuss at lists.lustre.org
>             <mailto:lustre-discuss at lists.lustre.org>
>             http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>             <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
>
>         _______________________________________________
>         lustre-discuss mailing list
>         lustre-discuss at lists.lustre.org
>         <mailto:lustre-discuss at lists.lustre.org>
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>         <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
>
>
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180216/fa8d5caf/attachment-0001.html>


More information about the lustre-discuss mailing list