[Lustre-discuss] Nodes claim error with files, then say everything is fine.
Brian J. Murrell
Brian.Murrell at Sun.COM
Wed Aug 6 09:45:23 PDT 2008
On Wed, 2008-08-06 at 10:41 -0600, Chris Worley wrote:
>
> Is there anything in /proc or /sys I can look at to see whatever
> "keepalive" parameters are setup?
All timeouts are based on the obd_timeout in /proc/sys/lustre/timeout
which MUST be the same on all nodes.
> The systems aren't dying.
They are failing to communicate with the MDS for some reason. Network
problems perhaps? You could try enabling +rpctrace debug and inspecting
the debug file for RPCs to see if the client is indeed sending something
(even if it's a ping) at regular intervals.
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080806/8b0c71f6/attachment.pgp>
More information about the lustre-discuss
mailing list