[Lustre-discuss] soft lockups on NFS server/Lustre client

Frederik Ferner frederik.ferner at diamond.ac.uk
Tue Oct 20 05:00:49 PDT 2009


Robin Humble wrote:
> On Mon, Oct 12, 2009 at 05:06:28PM +0100, Frederik Ferner wrote:
>> on our NFS server exporting our Lustre file system to a number of NFS 
>> clients, we've recently started to see "kernel: BUG: soft lockup" 
>> messages. As the locked processes include nfsd, our users are obviously 
>> not happy.
>>
>> Around the time when the soft lockup occurs we also see a log of 
>> "kernel: BUG: warning at fs/inotify.c:181/set_dentry_child_flags()" 
>> messages, but I don't know if this is related.
> 
> probably not related. we were seeing this too (no NFS involved at all)

I may have been looking at slightly the wrong thing here. It was first 
reported by our users as a NFS problem but it now seems to be triggered 
by samba access to some directories on Lustre. We've separated the samba 
server from the NFS server and now we only see this on the samba server 
and not on the NFS server.

>   https://bugzilla.redhat.com/show_bug.cgi?id=526853
> but it's probably being ignored. if you have a rhel support contract
> maybe you can kick it along a bit...

I see this has been closed as duplicate of
     https://bugzilla.redhat.com/show_bug.cgi?id=499019
which is unfortunately not accessible to me.

On the other hand Red Hat support have just pointed me at this bug as 
well and confirmed that it is not yet fixed in RHEL5.4.

> dunno about your soft lockups. as I understand it soft lockups
> themselves aren't harmful as long as they progress eventually.

Well, they are not harmful as such, my problem is that they seem to 
block the machine for some time and users complained about applications 
timing out when this affected the file system.

> Lustre 1.6.6 isn't exactly recent. have you tried 1.6.7.2 on your NFS
> exporter?

I know, until recently we did not have any real problems with 1.6.6 and 
the machines are in production. I'm currently trying to reproduce it in 
our test setup and may try 1.6.7.2 with an additional test machine on 
the production system as samba exporter during the next maintenance 
window. On the other hand it's now really looking like a RHEL bug, so 
I'm not too sure how much it would help

> presumably soft lockups could also be saying your re-exporter or OSS's
> are overloaded or that you have a slow disk or 3 in a RAID... without
> NFS involved are all your OSTs up to speed?

I think that the OSTs are not the problem here, as I'm not experiencing 
any problems on any of my other Lustre clients and now not anymore on 
the NFS server which is seeing more load than the samba server.

> do you still get problems after
>   echo 60 > /proc/sys/kernel/softlockup_thresh

After applying this on the samba server, I only see the Bug warnings and 
not the soft lockups in syslog, still my windows clients seem to freeze 
occasionally for about a minute when browsing the exported file system, 
so no change on the client side.

Cheers,
Frederik

-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)



More information about the lustre-discuss mailing list