[Lustre-discuss] mds server crashing

Mag Gam magawake at gmail.com
Sun Mar 15 05:32:05 PDT 2009


This happened again :-(

Basically, there is a process called "ll_mdt30" which is taking up
100% of the CPU. I am not sure what its doing but I can't even reboot
the system. I have to hard reboot.

Also, I checked my other OSTs and MDS and I don't have anything
special for NFS in /etc/modules.conf



On Sat, Mar 14, 2009 at 8:35 AM, Mag Gam <magawake at gmail.com> wrote:
> Hey Bernd:
>
> Thanks for the reply.
>
> Interesting. We are using with NFS too. Is there something in
> particular we need to do like "enable port 988 in /etc/modules.conf"
> which I think I am already doing.
>
>
>> Any chance you can send traces with line wrap disabled? With line wrapping it
>> is quite hard to read.
> Ofcourse! I even posted a bug report with the /tmp/lustre.log
> https://bugzilla.lustre.org/show_bug.cgi?id=18802
>
> Let me know if you need anything else.
>
> TIA
>
>
>
> On Sat, Mar 14, 2009 at 7:35 AM, Bernd Schubert
> <bernd.schubert at fastmail.fm> wrote:
>> On Saturday 14 March 2009, Mag Gam wrote:
>>> We are having a problem with a MDS server (which also has 1 OST) on the
>>> box.
>>>
>>> When the server boots up, we notice there is an ll_mdt process running
>>> at 100% and we keep on waiting close  to 10-15 mins. We only have 8
>>> clients. (I assume this normal recovery process). However if I
>>> manually mount up the mdt without any recovery everything is fine
>>
>> Hmm, I have seen that with 1.6.4.3 and NFS exports. But that should be fixed
>> in 1.6.5. Although I'm not sure, since we switched NFS exports to unfs3 ever
>> since the problem came up.
>>
>>>
>>> Mar 12 10:11:02 protected_host_01 kernel: Pid: 10375, comm: ll_mdt_10
>>> Tainted: G      2.6.18-92.1.17.el5_lustre.1.6.7smp #1
>>> Mar 12 10:11:02 protected_host_01 kernel: RIP:
>>> 0010:[<ffffffff888ed8df>]  [<ffffffff888ed8df>]
>>>
>>> :ldiskfs:do_split+0x3ef/0x560
>>>
>>> Mar 12 10:11:02 protected_host_01 kernel: RSP: 0018:ffff8103d2a5f460
>>> EFLAGS: 00000216
>>> Mar 12 10:11:02 protected_host_01 kernel: RAX: 0000000000000000 RBX:
>>> 0000000000000080 RCX: 0000000000000000
>>> Mar 12 10:11:02 protected_host_01 kernel: RDX: 0000000000000080 RSI:
>>> ffff8103cd52177c RDI: ffff8103cd52176c
>>
>> Any chance you can send traces with line wrap disabled? With line wrapping it
>> is quite hard to read.
>>
>>
>> Cheers,
>> Bernd
>>
>>
>



More information about the lustre-discuss mailing list