[Lustre-discuss] MDS crashes daily at the same hour
Andreas Dilger
adilger at sun.com
Mon Jan 4 10:42:12 PST 2010
On 2010-01-04, at 03:02, David Cohen wrote:
> I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a
> problem
> with qlogic drivers and rolled back to 1.6.6).
> My MDS get unresponsive each day at 4-5 am local time, no kernel
> panic or
> error messages before.
Judging by the time, I'd guess this is "slocate" or "mlocate" running
on all of your clients at the same time. This used to be a source of
extremely high load back in the old days, but I thought that Lustre
was in the exclude list in newer versions of *locate. Looking at the
installed mlocate on my system, that doesn't seem to be the case...
strange.
> Some errors and an LBUG appear in the log after force booting the
> MDS and
> mounting the MDT and then the log is clear until next morning:
>
> Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0:
> (class_hash.c:225:lustre_hash_findadd_unique_hnode())
> ASSERTION(hlist_unhashed(hnode)) failed
> Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0:
> (class_hash.c:225:lustre_hash_findadd_unique_hnode()) LBUG
> Jan 4 06:33:31 tech-mds kernel: Lustre: 6357:0:(linux-
> debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6357
> Jan 4 06:33:31 tech-mds kernel: ll_mgs_02 R running task
> 0 6357
> 1 6340 (L-TLB)
> Jan 4 06:33:31 tech-mds kernel: Call Trace:
> Jan 4 06:33:31 tech-mds kernel: thread_return+0x62/0xfe
> Jan 4 06:33:31 tech-mds kernel: __wake_up_common+0x3e/0x68
> Jan 4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x1218/0x13e0
> Jan 4 06:33:31 tech-mds kernel: default_wake_function+0x0/0xe
> Jan 4 06:33:31 tech-mds kernel: audit_syscall_exit+0x31b/0x336
> Jan 4 06:33:31 tech-mds kernel: child_rip+0xa/0x11
> Jan 4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x0/0x13e0
> Jan 4 06:33:31 tech-mds kernel: child_rip+0x0/0x11
It shouldn't LBUG during recovery, however.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list