[Lustre-discuss] concurrent open() fails sporadically
Michael Sternberg
sternberg at anl.gov
Wed Oct 28 13:38:33 PDT 2009
Greetings,
I'm seeing open() failures when attempting concurrent access in a
lustre fs.
The following Fortran program fails sporadically when run under
mpirun, even on the same host. Note that there is no MPI statement;
the mpirun simply keeps the startup times very close together:
----------------------------------------------------
$ cat test.f
program test
open(1, file = 'test.dat', status = 'old')
close(1)
write(*,*) "OK"
end
$ gfortran test.f
$ mpirun -np 8 a.out
OK
OK
OK
OK
OK
OK
OK
OK
$ mpirun -np 8 a.out
OK
OK
OK
OK
OK
OK
At line 2 of file test.f
Fortran runtime error: No such file or directory
OK
----------------------------------------------------
The "status= 'old'" seems to be the trigger. A C version never
failed (thus far):
----------------------------------------------------
$ cat test.c
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
main ()
{
if (fopen("test.dat", "r") == NULL) {
perror("test.dat");
} else {
char hostname[20];
gethostname(hostname, 20);
printf("%s: OK\n", hostname);
}
}
----------------------------------------------------
I run 2.6.18-92.1.17.el5_lustre.1.6.7.1smp on RHEL-5.3. The error
shows up with both gfortran-4.1.2 20080704 (Red Hat 4.1.2-44) and
Intel Fortran 10.1 20090817. The data file size is some 800K.
Nothing from lustre shows up in syslog on the clients or servers.
The error is quite unexpected for such a basic operation. Where
should I look for parameters to tweak?
I have mounted on the client:
mds01_ib at o2ib:mds02_ib at o2ib:/sandbox on /sandbox type lustre (rw)
on the MDS:
/dev/dm-2 on /mnt/mdt-sandbox type lustre (rw)
and OSS:
/dev/dm-2 on /mnt/ost0-sandbox type lustre (rw)
The MGS/MDS sit on the same disk, /dev/dm-1 (which also serves /home)
With best regards,
Michael
More information about the lustre-discuss
mailing list