[Lustre-discuss] MDS crashes daily at the same hour

Andreas Dilger adilger at sun.com
Mon Jan 4 10:42:12 PST 2010


On 2010-01-04, at 03:02, David Cohen wrote:
> I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a  
> problem
> with qlogic drivers and rolled back to 1.6.6).
> My MDS get unresponsive each day at 4-5 am local time, no kernel  
> panic or
> error messages before.

Judging by the time, I'd guess this is "slocate" or "mlocate" running  
on all of your clients at the same time.  This used to be a source of  
extremely high load back in the old days, but I thought that Lustre  
was in the exclude list in newer versions of *locate.  Looking at the  
installed mlocate on my system, that doesn't seem to be the case...   
strange.

> Some errors and an LBUG appear in the log after force booting the  
> MDS and
> mounting the MDT and then the log is clear until next morning:
>
> Jan  4 06:33:31 tech-mds kernel: LustreError: 6357:0:
> (class_hash.c:225:lustre_hash_findadd_unique_hnode())
> ASSERTION(hlist_unhashed(hnode)) failed
> Jan  4 06:33:31 tech-mds kernel: LustreError: 6357:0:
> (class_hash.c:225:lustre_hash_findadd_unique_hnode()) LBUG
> Jan  4 06:33:31 tech-mds kernel: Lustre: 6357:0:(linux-
> debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6357
> Jan  4 06:33:31 tech-mds kernel: ll_mgs_02     R  running task        
> 0  6357
> 1                6340 (L-TLB)
> Jan  4 06:33:31 tech-mds kernel: Call Trace:
> Jan  4 06:33:31 tech-mds kernel: thread_return+0x62/0xfe
> Jan  4 06:33:31 tech-mds kernel: __wake_up_common+0x3e/0x68
> Jan  4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x1218/0x13e0
> Jan  4 06:33:31 tech-mds kernel: default_wake_function+0x0/0xe
> Jan  4 06:33:31 tech-mds kernel: audit_syscall_exit+0x31b/0x336
> Jan  4 06:33:31 tech-mds kernel: child_rip+0xa/0x11
> Jan  4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x0/0x13e0
> Jan  4 06:33:31 tech-mds kernel: child_rip+0x0/0x11

It shouldn't LBUG during recovery, however.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list