[Lustre-discuss] possible file corruption

Faccini, Bruno bruno.faccini at intel.com
Mon Dec 10 03:38:48 PST 2012


Hello Jason,

Only case I know of such scenario you describe (ie, with no eviction msg at all) is what I reported from CEA site as part of LU-1974. But as you can see/read there, it was after some incomplete patch integration process which finally introduced a regression in Lustre-Client causing dirty-pages to not be flushed to Servers/OSTs when reaching their granted space and needed to renew it !!

May be you can run with "echo +cache > /proc/sys/lnet/[debug,printk]" traces enabled just to identify if you encounter the same kind of issue …
Bruno. 

Le Dec 4, 2012 à 3:52 PM, Jason Temple <jtemple at cscs.ch>
 a écrit :

> Hello,
> 
> I have a troubling issue with random file corruption using either lustre
> 1.8.6 (internal Cray lustre) and lustre 2.1 (sonexion - produced by
> xyratex).
> 
> Randomly, our users will come across an issue with files either having 0
> size, or being corrupted.  The 0 size files are usually ascii files
> (which are normally created with simple cat and awk statements,
> serially), while the corrupted files are weather data (grib) files that
> most of the time are truncated during an untar operation. Other times,
> the files have blocks filled with zeroes in the middle of the file.
> 
> The real kicker is that we can not reproduce the problem reliably in
> order to troubleshoot it.  I managed to trigger file truncation after
> 1500 iterations of untaring the same tar file, but since then, after
> 30,000 iterations, I haven't been able to reproduce it.
> 
> When it happens, there are no errors in the logs relating to lustre, and
> nothing is dumped into /tmp.
> 
> Has anyone come across this before?  I've searched google for weeks, but
> have only found a few bugs that seem like they might be similar, but are
> usually related to netcdf and parallel i/o, while our cases of
> corruption are usually encountered serially.
> 
> What log settings are suggested to try and capture this phantom while it
> is happening?
> 
> Thanks in advance,
> 
> Jason
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris, 
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




More information about the lustre-discuss mailing list