[Lustre-discuss] MDS refuses connections (no visible reason)

Patricia Santos Marco psantos at bifi.es
Mon Aug 17 11:14:54 PDT 2009


The last day our MDS refusing conections too. The logs are the same, and we
should reboot the MDS server . What's is the reason for this?

2009/3/5 Thomas Roth <t.roth at gsi.de>

> Hi all,
>
> after running for days without any problems, our MDS is refusing
> cooperation for two hours now.
> The log files show nothing until
> >Mar  5 16:46:24 mds1 kernel: Lustre:
> 17841:0:(ldlm_lib.c:525:target_handle_reconnect()) MDT0000: 481fa70b-590d
> -31b6-f621-c6125a54bfff reconnecting
> >Mar  5 16:46:24 mds1 kernel: Lustre:
> 17841:0:(ldlm_lib.c:760:target_handle_connect()) MDT0000: refuse reconnec
> tion from 481fa70b-590d-31b6-f621-c6125a54bfff at 1.2.3.4@tcp to
> 0xffff8107ef44a000; still busy with 2 active RPCs
>
> I thought that such a thing would be between the MDT and this particular
> client. However, the log goes on like that with many other clients.
>
> Now the MDS is refusing any connection, bringing the system to a stand
> still.
>
> The situation also triggered the dumping of ca. 130 log dumps to /tmp.
> Most of these are small and contain just
> >Watchdog triggered for pid 17866: it was inactive for 12000s
> >nable to dump stack because of missing export
>
> A few are larger and contain more complaints about lengthy requests and
> possible timeouts:
> >ptlrpc_server_handle_request   Request x75091039 took longer than
> estimated (42+4208s); client may timeout.
> or
> >ptlrpc_server_handle_request   Dropping timed-out request from
> 12345-140.181.114.222 at tcp: deadline 1000+923s ago
>
> All of these do not seem critical?
> Maybe all clients have timed out for some reason?
> Even so, I'd assume the MDS to be still responsive, say to a mount
> request from a fresh client, one that does not possibly have any
> leftover transactions pending on it?
>
> Right now the only thing I see to do is to reboot the server. Of course
> not a nice procedure on a system we advertised as stable and reliable to
> our users...
>
> So any help will be much appreciated.
> Regards,
> Thomas
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



-- 
(\__/)
( O.o)
( > <) Este es conejo.
Copia a conejo en tu firma y ayudalo en sus planes de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090817/b65c7c97/attachment.htm>


More information about the lustre-discuss mailing list