[Lustre-discuss] Nodes claim error with files, then say everything is fine.

Chris Worley worleys at gmail.com
Wed Aug 6 09:41:21 PDT 2008


On Wed, Aug 6, 2008 at 10:17 AM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:
> But this kind of eviction is simply due to clients that are unresponsive
> from the POV of the MDS.  They are neither making filesystem RPC nor are
> they "ping"ing (keepalives) so the MDS assumes they have died and evicts
> them to get back the locks it could be holding and not having that dead
> client holding up other, living clients.
>
> So you need to investigate why the clients are dying or appear to be
> dead (i.e. going silent) to the MDS.

Is there anything in /proc or /sys I can look at to see whatever
"keepalive" parameters are setup?

The systems aren't dying.

I need to know how to least obtrusively force the clients to keep
pinging, or tell the MDS to give them a longer time before timeout.

I don't see why this only effects the RHEL5 clients.  Maybe that's a hint.

Thanks,

Chris



More information about the lustre-discuss mailing list