[lustre-discuss] question on usage of O_LOV_DELAY_CREATE
Andreas Dilger
adilger at whamcloud.com
Wed Jul 31 19:33:29 PDT 2024
Thomas,
your analysis is as good as any possible. There should be at least an ioctl() call after the open() to create the objects before the pwrite64() call. You would need to discuss this with Cray, use a different MPI, or potentially "pre-create" the file before MPI_File_open() so that O_LOV_DELAY_CREATE has no effect.
Cheers, Andreas
On Jul 30, 2024, at 08:40, Bertschinger, Thomas Andrew Hjorth via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:
Hello,
We have an application that fails doing the following on one of our systems:
...
openat(AT_FDCWD, "mpi_test.out", O_WRONLY|O_CREAT|O_NOCTTY|FASYNC, 0611) = 4
pwrite64(4, "\3\0\0\0", 4, 0) = -1 EBADF (Bad file descriptor)
...
It opens a file with O_LOV_DELAY_CREATE (or O_NOCTTY|FASYNC as strace interprets it), and then immediately tries to write to it.
>From the comments above ll_file_open() in Lustre:
If opened with O_LOV_DELAY_CREATE, then we don't do the object creation or open until ll_lov_setstripe() ioctl is called.
It sounds like the expectation is that the process calling open() like this follows it up with an ioctl to set the stripe information prior to writing.
Is this correct? In other words, is it reasonable to say that the failing code is doing something erroneous?
Here's a minimal MPI program that reproduces the problem. The issue only arises when using the Cray MPI implementation, however. When tested with openmpi and ANL mpich, the openat() call doesn't use O_LOV_DELAY_CREATE. Since the Cray implementation is unfortunately not open source, I have no insight into what this code is "supposed" to be doing. :(
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int err = MPI_Init(&argc, &argv);
MPI_File fh;
err = MPI_File_open(MPI_COMM_WORLD, "mpi_test.out",
MPI_MODE_WRONLY|MPI_MODE_CREATE, MPI_INFO_NULL, &fh);
printf("MPI_File_open returned: %d\n", err);
long data = 3;
err = MPI_File_write(fh, &data, 1, MPI_LONG, MPI_STATUS_IGNORE);
printf("MPI_File_write returned: %d\n", err);
err = MPI_File_close(&fh);
printf("MPI_File_close returned: %d\n", err);
MPI_Finalize();
return 0;
}
Thanks,
Thomas Bertschinger
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240801/f067f2b5/attachment-0001.htm>
More information about the lustre-discuss
mailing list