[Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.
Oleg Drokin
Oleg.Drokin at Sun.COM
Tue Feb 10 06:21:53 PST 2009
Hello!
On Feb 10, 2009, at 5:17 AM, Simon Kelley wrote:
>
> Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 11-0: an error occurred
> while communicating with 172.31.96.96 at tcp. The obd_ping operation
> failed
> with -107
> Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: Skipped 12 previous
> similar messages
> Feb 9 14:05:30 sf-2-3-10 kernel: Lustre:
> OSC_sf2-sfs2.internal.sanger.ac.uk_sf2-sfs-ost495_MNT_client_tcp-
> ffff81021f897800:
>
> Connection to service sf2-sfs-ost495 via nid 172.31.96.96 at tcp was
> lost;
> in progress operations using this service will wait for recovery to
> complete.
> Feb 9 14:05:30 sf-2-3-10 kernel: Lustre: Skipped 4 previous similar
> messages
> Feb 9 14:05:30 sf-2-3-10 kernel: LustreError: 167-0: This client was
> evicted by sf2-sfs-ost495; in progress operations using this service
> will fail.
>
What would be useful here is if you can enable dlm tracing (echo
+dlm_trace >/proc/sys/lnet/debug)
on some of those 1.6 nodes (also if you are running with no debug
enabled at all,
also enable rpc_trace and info levels) and also enable "dump on
eviction" feature.
(echo 1 >/proc/sys/lustre/dump_on_eviction).
Then when next eviction happens, there would be some useful debug data
dumped on the client,
that you can attach to a bugzilla bug along with server-side eviction
message (processed
with "lctl dl" command first).
> We are also seeing some userspace file operations fail with the error
> "No locks available". These don't generate any logging on the client
> so
> I don't have exact timing. It's possible that they are associated with
> further "### lock callback timer expired" server logs.
This error code typically means an application attempting to do some i/
o and Lustre
has no lock for the i/o area for some reason anymore (it is normally
obtained
once read or write path is entered), and that could be related to
evictions too
(locks are revoked at eviction time).
Bye,
Oleg
More information about the lustre-discuss
mailing list