[Lustre-discuss] Luster clients getting evicted
Brock Palen
brockp at umich.edu
Mon Feb 4 10:48:45 PST 2008
On Feb 4, 2008, at 1:43 PM, Kilian CAVALOTTI wrote:
> On Monday 04 February 2008 10:17:37 am Brock Palen wrote:
>> The
>> cluster IS to big, but there isn't a person at the university who is
>> willing to pay for anything other than more cluster nodes. Enough
>> with politics.
>
> That's the first time I hear a cluster is too big, people usually
> complain about the contrary. :)
> But the second part sounds very very familiar, though... Anyway.
>
>> I just had another node get evicted while running code causing the
>> code to lock up. This time it was the MDS that evicted it. Pinging
>> work though:
>>
>> [root at nyx350 ~]# lctl ping 141.212.30.184 at tcp
>> 12345-0 at lo
>> 12345-141.212.30.184 at tcp
>
> Ok.
>
>> I have attached the output of lctl dk from the client and some
>> syslog messages from the MDS.
>
> (recover.c:188:ptlrpc_request_handle_notconn()) import
> nobackup-MDT0000-mdc-000001012bd27c00 of
> nobackup-MDT0000_UUID at 141.212.30.184@tcp abruptly disconnected:
> reconnecting
> (import.c:133:ptlrpc_set_import_discon())
> nobackup-MDT0000-mdc-000001012bd27c00: Connection to service
> nobackup-MDT0000 via nid 141.212.30.184 at tcp was lost;
>
> I will let Lustre people comment on this, but this sure looks like a
> network problem to me.
>
> Is there any information you can get out of the switches (logs,
> dropped
> packets, retries, stats, anything)?
The client, shows 107 dropped packets. The servers have none. I
think your right, the client is the same clint that caused problems
in the week sooner with losing connections to the OSS is now losing
the connection to the MDT.
I have asked networking to look at the counters between the force10
and the cisco.
Lustre doesn't care about frames at 6000 MTU right?
>
>> Nope both servers have 2GB ram, and load is almost 0. No swapping.
>
> Do you see dropped packets or errors in your ifconfig output, on the
> servers and/or clients?
>
> Cheers,
> --
> Kilian
>
>
More information about the lustre-discuss
mailing list