[Lustre-discuss] concurrent open() fails sporadically

Brian J. Murrell Brian.Murrell at Sun.COM
Wed Oct 28 13:47:47 PDT 2009


On Wed, 2009-10-28 at 15:38 -0500, Michael Sternberg wrote:
> 
> I'm seeing open() failures when attempting concurrent access in a  
> lustre fs.
> 
> The following Fortran program fails sporadically when run under  
> mpirun, even on the same host.

Yet...

> A C version never  
> failed (thus far):

This might be indicative.  Maybe not.  Fortran might just be exposing a
race condition that the C version is not.

> Nothing from lustre shows up in syslog on the clients or servers.

Ahh.  Well, I'd be sceptical that this is a Lustre problem then.

> The error is quite unexpected for such a basic operation.  Where  
> should I look for parameters to tweak?

There is nothing that needs tweaking to make such a use case work.  As
you see with your C program.  I trust the C program more as it's
programming much closer to the system calls than Fortran would.

What would be ideal is an strace of the fortran program failing so that
we can see what the system calls did.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091028/4643cf76/attachment.pgp>


More information about the lustre-discuss mailing list