[Lustre-discuss] Luster clients getting evicted

Brock Palen brockp at umich.edu
Mon Feb 4 10:48:45 PST 2008


On Feb 4, 2008, at 1:43 PM, Kilian CAVALOTTI wrote:

> On Monday 04 February 2008 10:17:37 am Brock Palen wrote:
>> The
>> cluster IS to big, but there isn't a person at the university who is
>> willing to pay for anything other than more cluster nodes.  Enough
>> with politics.
>
> That's the first time I hear a cluster is too big, people usually
> complain about the contrary. :)
> But the second part sounds very very familiar, though... Anyway.
>
>> I just had another node get evicted while running code causing the
>> code to lock up.  This time it was the MDS that evicted it.  Pinging
>> work though:
>>
>> [root at nyx350 ~]# lctl ping 141.212.30.184 at tcp
>> 12345-0 at lo
>> 12345-141.212.30.184 at tcp
>
> Ok.
>
>> I have attached the output of lctl dk  from the client and some
>> syslog messages from the MDS.
>
> (recover.c:188:ptlrpc_request_handle_notconn()) import
> nobackup-MDT0000-mdc-000001012bd27c00 of
> nobackup-MDT0000_UUID at 141.212.30.184@tcp abruptly disconnected:
> reconnecting
> (import.c:133:ptlrpc_set_import_discon())
> nobackup-MDT0000-mdc-000001012bd27c00: Connection to service
> nobackup-MDT0000 via nid 141.212.30.184 at tcp was lost;
>
> I will let Lustre people comment on this, but this sure looks like a
> network problem to me.
>
> Is there any information you can get out of the switches (logs,  
> dropped
> packets, retries, stats, anything)?

The client, shows 107 dropped packets.  The servers have none.  I  
think your right, the client is the same clint that caused problems  
in the week sooner with losing connections to the OSS is now losing  
the connection to the MDT.

I have asked networking to look at the counters between the force10  
and the cisco.

Lustre doesn't care about frames at 6000 MTU right?

>
>> Nope both servers have 2GB ram, and load is almost 0.  No swapping.
>
> Do you see dropped packets or errors in your ifconfig output, on the
> servers and/or clients?
>
> Cheers,
> -- 
> Kilian
>
>




More information about the lustre-discuss mailing list