[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

Christopher Walker cwalker at fas.harvard.edu
Wed May 4 20:57:45 PDT 2011


Hi Larry,

Everything below is with the filesystem mounted with localflock.

This does indeed look a lot like the bug referred to by David Dillow 
(thanks!)

Chris

On 5/4/11 10:05 PM, Larry wrote:
> try mounting the lustre filesystem with -o flock or -o localflock
>
> On Thu, May 5, 2011 at 4:47 AM, Christopher Walker
> <cwalker at fas.harvard.edu>  wrote:
>> Hello,
>>
>> We have a user who is trying to post-process HDF files in R.  Her script
>> goes through a number (~2500) of files in a directory, opening and
>> reading the contents.  This usually goes fine, but occasionally the
>> script dies with:
>>
>>
>> HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080:
>>    #000: H5F.c line 1560 in H5Fopen(): unable to open file
>>      major: File accessability
>>      minor: Unable to open file
>>    #001: H5F.c line 1337 in H5F_open(): unable to read superblock
>>      major: File accessability
>>      minor: Read failed
>>    #002: H5Fsuper.c line 542 in H5F_super_read(): truncated file
>>      major: File accessability
>>      minor: File has been truncated
>> Error in hdf5load(file = myfile, load = FALSE, verbosity = 0, tidy =
>> TRUE) :
>>    unable to open HDF file:
>> /n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5
>> HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080:
>>    #000: H5F.c line 2012 in H5Fclose(): decrementing file ID failed
>>      major: Object atom
>>      minor: Unable to close file
>>    #001: H5I.c line 1340 in H5I_dec_ref(): can't locate ID
>>      major: Object atom
>>      minor: Unable to find atom information (already closed?)
>> Error in hdf5cleanup(16778754L) : unable to close HDF file
>>
>>
>> But this file definitely does exist -- any stat or ls command shows it
>> without a problem.  Further, once I 'ls' this file, if I rerun the same
>> script, it successfully reads this file, but then dies on the next one
>> with the same error.  If I 'ls' the entire directory, the script runs to
>> completion without a problem.  strace output shows:
>>
>> open("/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5",
>> O_RDONLY) = 3
>> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> lseek(3, 0, SEEK_SET)                   = 0
>> read(3, "\211HDF\r\n\32\n", 8)          = 8
>> read(3, "\0", 1)                        = 1
>> read(3,
>> "\0\0\0\0\10\10\0\4\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377@"...,
>> 87) = 87
>> close(3)                                = 0
>> write(2, "HDF5-DIAG: Error detected in HDF"..., 42) = 42
>> etc
>>
>> which initially looks fine to me, followed by an abrupt close.
>>
>> NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any
>> suggestions?
>>
>> Thanks very much,
>> Chris
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>




More information about the lustre-discuss mailing list