[Lustre-discuss] Connection losses to MGS/MDS
Wojciech Turek
wjt27 at cam.ac.uk
Thu Dec 18 11:01:55 PST 2008
Hi,
It doesn't look healthy. I assume that those messages and the numbers
are from the client side, what do you see on the MDS server itself?
It seem to me that your network connection to the MDS is flaky and thus
so many disconnection messages. It maybe doesn't hurt noticeably your
bandwidth performance but it should certainly kill your mata data
performance. I suggest to run some test and see for yourself. From your
email I see that you are using Ethernet for connecting MDS to the rest
of the cluster. It maybe worth of checking the cable or the interface
for errors, dropped packets.
I have here 600 nodes cluster 100% utilized with jobs for most of the
time, lustre is serving /home and /scratch file system and I don't see
these messages in the logs. I use lustre 1.6.6 for RHEL4
cheers
Wojciech
Thomas Roth wrote:
> Hi all,
>
> in a cluster with 375 clients, for a 12 hour period I get about 500
> messages of the type
>
> > Connection to service MGS via nid A.B.C.D at tcp was lost; in progress
> operations using this service will fail.
>
> and about 800 messages of the type
>
> > Connection to service MDT0000 via nid A.B.C.D at tcp was lost; in
> progress operations using this service will wait for recovery to complete.
>
> Those clients are batch farm nodes, they run continuously all kind of
> user jobs that read and write data on Lustre.
>
> I have no way of telling how bad this situation is, since I know only
> the error logs of our cluster. I have seen these messages right from the
> start of testing this cluster, but did not try to count them, since the
> performance then was splendid.
>
> So what is your experience? Should there be no errors of this kind at
> all, is it something to be expected on a busy network, should there be a
> few connection losses due to specific machine problems, or is this just
> normal?
>
> Thanks,
> Thomas
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list