[Lustre-discuss] Nodes claim error with files, then say everything is fine.
Chris Worley
worleys at gmail.com
Wed Aug 6 09:41:21 PDT 2008
On Wed, Aug 6, 2008 at 10:17 AM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:
> But this kind of eviction is simply due to clients that are unresponsive
> from the POV of the MDS. They are neither making filesystem RPC nor are
> they "ping"ing (keepalives) so the MDS assumes they have died and evicts
> them to get back the locks it could be holding and not having that dead
> client holding up other, living clients.
>
> So you need to investigate why the clients are dying or appear to be
> dead (i.e. going silent) to the MDS.
Is there anything in /proc or /sys I can look at to see whatever
"keepalive" parameters are setup?
The systems aren't dying.
I need to know how to least obtrusively force the clients to keep
pinging, or tell the MDS to give them a longer time before timeout.
I don't see why this only effects the RHEL5 clients. Maybe that's a hint.
Thanks,
Chris
More information about the lustre-discuss
mailing list