[Lustre-discuss] WARNING: data corruption issue found in 1.8.x releases

Andreas Dilger adilger at sun.com
Wed Sep 9 23:05:38 PDT 2009


On Sep 09, 2009  15:30 -0400, Charles A. Taylor wrote:
> > You recommend disabling the read and the write as the settings
> > indicate or just the read as the text indicates?
> 
> A clarification would be good here.   So far, we have found that our
> OSSs crash with the recommended work-around so that is a non-starter for
> us.   If we can run with just the read_cache_enable=0 and that is
> acceptable to avoid the corruptions bug, then that would be good to
> know.

The problem affects OSS-side caching of both write and read.  That said,
by disabling only the read cache you would reduce the chance of hitting
the problem significantly.  For writes there would still be a small
chance of data corruption making it to disk if a client was in the
middle of doing a write, it fails (due to eviction, network error, etc)
and then another client starts a partial-page write of the same data some
time after this failure.

This is a pretty unlikely scenario, since most clients aren't evicted
very often, they write to separate files, or they write to disjoint
parts of the same file.  Still, there is some small risk.

> At the moment we are not even sure we can run with just
> read_cache_enable=0.   We just know that we can't run with them both
> disabled for more than a few minutes with crashing in
> obd_filter_preprw().

Can you please post your stack traces into bug 20560 so that we can
resolve this problem ASAP.

Note that the patch to actually fix this problem is already in bug 20560,
but it requires rebuilding Lustre for the OST.

> > -----Original Message-----
> > A patch is under testing and will be included in 1.8.1.1.
> > Until 1.8.1.1 is available, we recommend to disable the OSS read cache
> > feature. This feature can be disabled by running the two following
> > commands on the OSSs:
> > # lctl set_param obdfilter.*.writethrough_cache_enable=0
> > # lctl set_param obdfilter.*.read_cache_enable=0
> > 
> > This has to be done each time an OST is restarted.
> > 
> > Best regards,
> > Johann, for the Lustre team
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list