[Lustre-discuss] Nodes claim error with files, then say everything is fine.

Brian J. Murrell Brian.Murrell at Sun.COM
Wed Aug 6 09:45:23 PDT 2008


On Wed, 2008-08-06 at 10:41 -0600, Chris Worley wrote:
> 
> Is there anything in /proc or /sys I can look at to see whatever
> "keepalive" parameters are setup?

All timeouts are based on the obd_timeout in /proc/sys/lustre/timeout
which MUST be the same on all nodes.

> The systems aren't dying.

They are failing to communicate with the MDS for some reason.  Network
problems perhaps?  You could try enabling +rpctrace debug and inspecting
the debug file for RPCs to see if the client is indeed sending something
(even if it's a ping) at regular intervals.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080806/8b0c71f6/attachment.pgp>


More information about the lustre-discuss mailing list