[Lustre-discuss] Lustre, locking, and fsync

Mon Feb 8 17:42:57 PST 2010

----- "Peter Grandi" <pg_lus at lus.for.sabi.co.UK> wrote:

> >>> On Mon, 8 Feb 2010 15:45:41 -0600, Robert Olson
> <olson at mcs.anl.gov> said:
> 
> olson> [ ... ] job metadata in an XML file
> 
> Sounds like that you are trying to implement a distributed
> Lustre backed queueing system. Good luck.

:-)

this scheme is actually working very well for us; the current errors cropped up when I changed the job scheduling in a way that many nodes wrote to the metadata at the same time (I'd engineered around that case in the past).

> 
> 
> That mention of "NFS" here is worrying; if you are accessing a
> Lustre file via an NFS-Lustre proxy server I suspect you should
> be looking into NFS, and anyhow it may not be that reliable.

No - the NFS case is entirely separate and is straight NFS on a Linux server (which is giving me different problems relating to stale locking, etc). What I was describing was using a native Lustre mount.

--bob