[Lustre-discuss] mds server crashing

Sun Mar 15 06:54:19 PDT 2009

Hey Bernd:

Thanks for the response.

I have a  bigger problem now. My ll_mdt is always at 100% even if I
mount up my MDS with -o abort recovery.

I am not sure what to do to get my filesystem on track now.

Any ideas? I am getting kind of desperate now :-(

On Sun, Mar 15, 2009 at 9:22 AM, Bernd Schubert
<bs_lists at aakef.fastmail.fm> wrote:
> Hello Mag,
>
> sorry for my late reply. I think there is a misunderstanding. The bug I'm
> talking about is if you export Lustre by knfsd. It is not important if you do
> use any other NFS services on your MDS/OSS system. But if you should export
> Lustre by NFS using the kernel export nfs daemon, try to disable that.
>
>
> Cheers,
> Bernd
>
> On Sunday 15 March 2009, Mag Gam wrote:
>> This happened again :-(
>>
>> Basically, there is a process called "ll_mdt30" which is taking up
>> 100% of the CPU. I am not sure what its doing but I can't even reboot
>> the system. I have to hard reboot.
>>
>> Also, I checked my other OSTs and MDS and I don't have anything
>> special for NFS in /etc/modules.conf
>>
>> On Sat, Mar 14, 2009 at 8:35 AM, Mag Gam <magawake at gmail.com> wrote:
>> > Hey Bernd:
>> >
>> > Thanks for the reply.
>> >
>> > Interesting. We are using with NFS too. Is there something in
>> > particular we need to do like "enable port 988 in /etc/modules.conf"
>> > which I think I am already doing.
>> >
>> >> Any chance you can send traces with line wrap disabled? With line
>> >> wrapping it is quite hard to read.
>> >
>> > Ofcourse! I even posted a bug report with the /tmp/lustre.log
>> > https://bugzilla.lustre.org/show_bug.cgi?id=18802
>> >
>> > Let me know if you need anything else.
>> >
>> > TIA
>> >
>> >
>> >
>> > On Sat, Mar 14, 2009 at 7:35 AM, Bernd Schubert
>> >
>> > <bernd.schubert at fastmail.fm> wrote:
>> >> On Saturday 14 March 2009, Mag Gam wrote:
>> >>> We are having a problem with a MDS server (which also has 1 OST) on the
>> >>> box.
>> >>>
>> >>> When the server boots up, we notice there is an ll_mdt process running
>> >>> at 100% and we keep on waiting close  to 10-15 mins. We only have 8
>> >>> clients. (I assume this normal recovery process). However if I
>> >>> manually mount up the mdt without any recovery everything is fine
>> >>
>> >> Hmm, I have seen that with 1.6.4.3 and NFS exports. But that should be
>> >> fixed in 1.6.5. Although I'm not sure, since we switched NFS exports to
>> >> unfs3 ever since the problem came up.
>> >>
>> >>> Mar 12 10:11:02 protected_host_01 kernel: Pid: 10375, comm: ll_mdt_10
>> >>> Tainted: G      2.6.18-92.1.17.el5_lustre.1.6.7smp #1
>> >>> Mar 12 10:11:02 protected_host_01 kernel: RIP:
>> >>> 0010:[<ffffffff888ed8df>]  [<ffffffff888ed8df>]
>> >>>
>> >>> :ldiskfs:do_split+0x3ef/0x560
>> >>>
>> >>> Mar 12 10:11:02 protected_host_01 kernel: RSP: 0018:ffff8103d2a5f460
>> >>> EFLAGS: 00000216
>> >>> Mar 12 10:11:02 protected_host_01 kernel: RAX: 0000000000000000 RBX:
>> >>> 0000000000000080 RCX: 0000000000000000
>> >>> Mar 12 10:11:02 protected_host_01 kernel: RDX: 0000000000000080 RSI:
>> >>> ffff8103cd52177c RDI: ffff8103cd52176c
>> >>
>> >> Any chance you can send traces with line wrap disabled? With line
>> >> wrapping it is quite hard to read.
>> >>
>> >>
>> >> Cheers,
>> >> Bernd
>
>
>