[Lustre-discuss] OST went back in time to 0 (bug 9646)

Jakob Goldbach jakob at goldbach.dk
Wed Jul 29 23:29:17 PDT 2009


Hi,

I have a question on bug 9646 - Server went back in time.

I had an OSS crash and had to pull the power. After mounting lustre
again I see the following on one of my clients:

(import.c:909:ptlrpc_connect_interpret()) b-OST0010_UUID went back in
time (transno 12901362807 was previously committed, server now claims
0)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646

This bug description suggest that there are commits lost in hardware
cache - but how can it loose all commits (transno is zero)? (btw, cache
is battery backup up)

On the client that I saw this I had previosly deactivated the import
bacause of the crash. Is this the reason I'm seeing this transno as
zero ? (full dmesg below)


Thanks,
Jakob



2860:0:(import.c:508:import_select_connection())
b-OST0010-osc-ffff81022ce89800: tried all connections, increasing
latency to 27s

setting import backup-OST0010_UUID INACTIVE by administrator request

8281:0:(import.c:508:import_select_connection())
b-OST0010-osc-ffff81022ce89800: tried all connections, increasing
latency to 32s

167-0: This client was evicted by b-OST0010; in progress operations
using this service will fail.

b-OST0010-osc-ffff81022ce89800: Connection restored to service b-OST0010
using nid 172.16.14.36 at tcp.

11-0: an error occurred while communicating with 172.16.14.36 at tcp. The
ost_statfs operation failed with -11
...
11-0: an error occurred while communicating with 172.16.14.36 at tcp. The
obd_ping operation failed with -107

b-OST0010-osc-ffff81022ce89800: Connection to service backup-OST0010 via
nid 172.16.14.36 at tcp was lost; in progress operations using this service
will wait for recovery to complete.

2859:0:(import.c:909:ptlrpc_connect_interpret()) b-OST0010_UUID went
back in time (transno 12901362807 was previously committed, server now
claims 0)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646

167-0: This client was evicted by backup-OST0010; in progress operations
using this service will fail.

b-OST0010-osc-ffff81022ce89800: Connection restored to service b-OST0010
using nid 172.16.14.36 at tcp.






More information about the lustre-discuss mailing list