[Lustre-discuss] OST went back in time to 0 (bug 9646)
Jakob Goldbach
jakob at goldbach.dk
Wed Jul 29 23:29:17 PDT 2009
Hi,
I have a question on bug 9646 - Server went back in time.
I had an OSS crash and had to pull the power. After mounting lustre
again I see the following on one of my clients:
(import.c:909:ptlrpc_connect_interpret()) b-OST0010_UUID went back in
time (transno 12901362807 was previously committed, server now claims
0)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646
This bug description suggest that there are commits lost in hardware
cache - but how can it loose all commits (transno is zero)? (btw, cache
is battery backup up)
On the client that I saw this I had previosly deactivated the import
bacause of the crash. Is this the reason I'm seeing this transno as
zero ? (full dmesg below)
Thanks,
Jakob
2860:0:(import.c:508:import_select_connection())
b-OST0010-osc-ffff81022ce89800: tried all connections, increasing
latency to 27s
setting import backup-OST0010_UUID INACTIVE by administrator request
8281:0:(import.c:508:import_select_connection())
b-OST0010-osc-ffff81022ce89800: tried all connections, increasing
latency to 32s
167-0: This client was evicted by b-OST0010; in progress operations
using this service will fail.
b-OST0010-osc-ffff81022ce89800: Connection restored to service b-OST0010
using nid 172.16.14.36 at tcp.
11-0: an error occurred while communicating with 172.16.14.36 at tcp. The
ost_statfs operation failed with -11
...
11-0: an error occurred while communicating with 172.16.14.36 at tcp. The
obd_ping operation failed with -107
b-OST0010-osc-ffff81022ce89800: Connection to service backup-OST0010 via
nid 172.16.14.36 at tcp was lost; in progress operations using this service
will wait for recovery to complete.
2859:0:(import.c:909:ptlrpc_connect_interpret()) b-OST0010_UUID went
back in time (transno 12901362807 was previously committed, server now
claims 0)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646
167-0: This client was evicted by backup-OST0010; in progress operations
using this service will fail.
b-OST0010-osc-ffff81022ce89800: Connection restored to service b-OST0010
using nid 172.16.14.36 at tcp.
More information about the lustre-discuss
mailing list