[Lustre-discuss] Inconsistent data with Lustre 1.6.3

Jeff Darcy jeffd at sicortex.com
Tue Jan 29 13:44:37 PST 2008


We're running Lustre 1.6.3 and Linux 2.6.18 on our 972-node 
(5832-processor) machines, and we're seeing some interesting problems 
when we run executables from a Lustre filesystem.  When we run 
5000-processor jobs, we often see some - maybe only a few, maybe a 
couple of dozen - fail with illegal-instruction and other traps, where 
examining the core file shows that the instructions in question are just 
fine (and the same as on jobs that succeeded).  Has anybody else seen 
similar problems running executables from a Lustre filesystem?

The setup in our lab only has MGS+MDT and one OST on one node, and two 
OSTs on another, exported to the rest via socklnd over our Ethernet 
emulation.  This originally showed up in some Fortran code, but we have 
also been able to reproduce it with a generated C program that contains 
nothing but 50,000 "x = x + 1" lines.  On the theory that this has 
something to do with I/O being completed prematurely - i.e. while 
buffers are in fact still being filled - we produced a variant of the 
program that walks through the entire program text to make sure the 
pages all get loaded well before they're accessed, and the failures do 
not occur in this mode.  Stranger still, after a few runs (more than 
one) with the page-scanning turned on, runs without the page-scanning 
also start to succeed.  Copy the executable to a new location, though, 
and the failures start all over again.  This seems to support the theory 
that there's a race in the I/O completion code, but doesn't tell us much 
more than that.

There's a significant chance that the problem is architecture-specific 
(our CPU architecture is MIPS with weak memory ordering) and/or in Linux 
rather than Lustre, but the same test has run fine using Lustre 1.6beta 
on Linux 2.6.15 and on other filesystems (e.g. NFS or ext3 over NBD) 
using current versions.  If anybody has any suggestions about places to 
look, parameters to tweak for the sake of experimentation, etc. it would 
be most appreciated.



More information about the lustre-discuss mailing list