[Lustre-discuss] errors on mds and osses regarding to cheksum and decreased stripe counts

Tue Jun 30 14:12:43 PDT 2009

On Jun 29, 2009  18:35 +0300, Ender G�ler wrote:
> We have lustre 1.6.5.1 installation on RHEL 5.1. The interconnect is
> infiniband. I came across the errors like following, on mds:
> 
> Lustre: 21241:0:(lov_qos.c:427:qos_shrink_lsm()) using fewer stripes for
> object 103514695: old 8 new 6

This can happen if some of your OSTs are not responsive to precreate
requests.  It appears you are using a wide striping by default, which
is good if you have lots of clients reading/writing from the same file
on a regular basis, but is not recommended if clients normally read/write
from a single file OR the bandwidth of a single OST can handle the needs
of a single client.

> And here is the errors regarding to checksum, on one of the ost's:
> LustreError: 12397:0:(ost_handler.c:1225:ost_brw_write()) client csum
> 41d0fa49, original server csum e388fa92, server csum now e388fa92

This looks like you are having network problems, or possibly you are
using mmap IO?  The data is arriving at the server is different than
the data that was originally checksummed by the client.  This can happen
in some cases if the client is doing repeated mmap writes to the same
part of the file.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.