[Lustre-discuss] Lustre 1.6.4.1 - client lockup

Niklas Edmundsson Niklas.Edmundsson at hpc2n.umu.se
Sun Jan 27 22:28:08 PST 2008


On Fri, 25 Jan 2008, Kilian CAVALOTTI wrote:

> Hi Niklas,
>
> On Friday 25 January 2008 07:10:47 am Niklas Edmundsson wrote:
>> We're able to consistently kill the lustre client with bonnie in
>> combination with striping.
>
> Out of curiosity, I tried to reproduce your experiment, and didn't
> encounter any problem. All the bonnie processes ran fine.

Interesting...

> There are a lot of significative differences between our test
> environments, but I thought it may be useful to know the results of
> your test case on a different system.
>
>> This is Lustre 1.6.4.1, Debian 2.6.18
>> amd64 kernel with lustre patches on both server and clients
>
> I used Lustre 1.6.4.1, RHEL4 and 2.6.9-55.0.9.EL_lustre.1.6.4.1smp amd64
> x86_64 kernel.
> 
>> All machines are dual opterons connected with GigE.
>
> They are Intel quad-cores (E5345) connected with IB.

Not identical environments, but it still suggests that there's 
something funky with the 2.6.18 support...

>> We have 5 servers, 1 MDS with 1 MGS and 1 MDT target and 4 OSS's with
>> 2 OST targets (~1.2TB) each.
>
> We have 9 servers, 1 MDS with MGS and MDT, and 8 OSSs with 2 OSTs each.
>
>> Jan 25 11:16:23 BUG: soft lockup detected on CPU#1!
>
>> After 10-15 minutes it locks up, this time with a bunch of
>> LustreErrors before the stack trace:
>
> They look like a network interruption problem, but it's hard to tell if
> that's the cause or the consequence. Can't that be that your Ethernet
> switches dropped some packets?

Given that it's TCP packet drops shouldn't affect stuff in that way 
IMHO.

My guess is that something is writing outside its buffer and killing 
some random part of the kernel, we're usually seeing these kinds of 
problems then... Usually pure hell to debug :/

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke at hpc2n.umu.se
---------------------------------------------------------------------------
  "Run out of small children to butcher?" -- G'Kar
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



More information about the lustre-discuss mailing list