[Lustre-discuss] Connection losses to MGS/MDS

Wojciech Turek wjt27 at cam.ac.uk
Thu Dec 18 11:01:55 PST 2008


Hi,

It doesn't look healthy. I assume that those messages and the numbers 
are from the client side, what do you see on the MDS server itself?
It seem to me that your network connection to the MDS is flaky and thus 
so many disconnection messages. It maybe doesn't hurt noticeably your 
bandwidth  performance but it should certainly kill your mata data 
performance. I suggest to run some test and see for yourself. From your 
email I see that you are using Ethernet for connecting MDS to the rest 
of the cluster. It maybe worth of checking the cable or the interface 
for errors, dropped packets.

I have here 600 nodes cluster 100% utilized with jobs for most of the 
time, lustre is serving /home and /scratch file system and I don't see 
these messages in the logs. I use lustre 1.6.6 for RHEL4

cheers

Wojciech

Thomas Roth wrote:
> Hi all,
>
> in a cluster with 375 clients, for a  12 hour period I get about  500 
> messages  of the type
>
>  > Connection to service MGS via nid A.B.C.D at tcp was lost; in progress 
> operations using this service will fail.
>
> and about 800 messages of the type
>
>  > Connection to service MDT0000 via nid A.B.C.D at tcp was lost; in 
> progress operations using this service will wait for recovery to complete.
>
> Those clients are batch farm nodes, they run continuously all kind of 
> user jobs that read and write data on Lustre.
>
> I have no way of telling how bad this situation is, since I know only 
> the error logs of our cluster. I have seen these messages right from the 
> start of testing this cluster, but did not try to count them, since the 
> performance then was splendid.
>
> So what is your experience? Should there be no errors of this kind at 
> all, is it something to be expected on a busy network, should there be a 
> few connection losses due to specific machine problems, or is this just 
> normal?
>
> Thanks,
> Thomas
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list