[Lustre-discuss] Cannot mount MDT after crash and writeconf

Andreas Dilger adilger at sun.com
Thu Sep 10 12:47:03 PDT 2009


On Sep 10, 2009  15:49 +0100, John Cowan wrote:
> After running 'tunefs.lustre -writeconf /dev/mdt01' on my combined mdt/mgs I am unable to mount the device - this was following a mds crash. This is on sles 10sp2, 2.6.16.60-0.31, with 1.8.0. A couple of OSTs and a client are installed on the same machine, but they haven’t been mounted since the problem occurred.
> 
> In the hope that this might be related to a corrupted CATALOGS file (like in bug 16002), I mounted the mdt with ldiskfs and removed the file, but still get the same problems.
> 
> Can anyone lend any advice on any recovery steps they might take?
> 
> Sep 10 14:12:35 uklust01 kernel: LustreError: 5537:0:(lvfs_linux.c:449:l_filldir()) ASSERTION(sizeof(dirent->lld_name) >= namlen + 1) failed
> Sep 10 14:12:35 uklust01 kernel: LustreError: 5537:0:(lvfs_linux.c:449:l_filldir()) LBUG

There is a file in the MDS PENDING directory which shouldn't be there.
Please mount the MDS with ldiskfs "ls -l $MDSMOUNT/PENDING" and remove
the file.  You _may_ have to run e2fsck, it depends on whether this is
a regular file or a directory.

Granted, this probably shouldn't be an LBUG, but at the same time,
there must be some form of directory corruption because there should
never be too-long filenames in this directory.  If you want, you
could file a bug for this - it would probably be easy to fix so that
it just deleted this file instead of doing an LBUG.

> Sep 10 14:12:35 uklust01 kernel: Lustre: 5537:0:(linux-debug.c:222:libcfs_debug_dumpstack()) showing stack for process 5537
> Sep 10 14:12:35 uklust01 kernel: ll_mgs_00     R  running task       0  5537      1          5538  5519 (L-TLB)
> Sep 10 14:12:35 uklust01 kernel: 0000000000000292 0000000000000001 0000000000000011 ffffffff80132cd0
> Sep 10 14:12:35 uklust01 kernel:        ffffffff8044cd49 0000000000000000 0000000000000009 ffffffff80132cd0
> Sep 10 14:12:35 uklust01 kernel:        000015a100000000 000000de00000000
> Sep 10 14:12:35 uklust01 kernel: Call Trace: <ffffffff80132cd0>{vprintk+607} <ffffffff80132cd0>{vprintk+607}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8014d278>{kallsyms_lookup+244} <ffffffff8014d278>{kallsyms_lookup+244}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8014d278>{kallsyms_lookup+244} <ffffffff8010c2ac>{printk_address+154}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff80132d49>{printk+78} <ffffffff8014a3f2>{module_text_address+51}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff80143e71>{kernel_text_address+26} <ffffffff8010c47e>{show_trace+453}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8010c5b5>{show_stack+201} <ffffffff881e4ada>{:libcfs:lbug_with_loc+122}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff881ecdd0>{:libcfs:tracefile_init+0} <ffffffff88209b27>{:lvfs:l_filldir+391}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff882099a0>{:lvfs:l_filldir+0} <ffffffff88623ebb>{:ldiskfs:call_filldir+139}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff886241e6>{:ldiskfs:ldiskfs_readdir+486} <ffffffff882099a0>{:lvfs:l_filldir+0}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff88430a6d>{:ptlrpc:ldlm_process_plain_lock+3101}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8012bf0f>{__wake_up+56} <ffffffff882099a0>{:lvfs:l_filldir+0}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff80193c3f>{vfs_readdir+128} <ffffffff88209974>{:lvfs:l_readdir+36}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff88684ee4>{:mgs:class_dentry_readdir+628} <ffffffff8841ae2a>{:ptlrpc:ldlm_lock_addref_internal_nolock+58}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8868a598>{:mgs:mgs_erase_logs+152} <ffffffff884369f0>{:ptlrpc:ldlm_blocking_ast+0}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff8843a360>{:ptlrpc:ldlm_completion_ast+0} <ffffffff88676653>{:mgs:mgs_handle+2915}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff88463fe5>{:ptlrpc:lustre_msg_get_conn_cnt+53}
> Sep 10 14:12:35 uklust01 kernel:        <ffffffff88468d9d>{:ptlrpc:ptlrpc_check_req+29} <ffffffff8846b338>{:ptlrpc:ptlrpc_server_handle_request+2712}
> Sep 10 14:12:36 uklust01 kernel:        <ffffffff8012af22>{try_to_wake_up+1039} <ffffffff8013aee7>{lock_timer_base+27}
> Sep 10 14:12:36 uklust01 kernel:        <ffffffff80129605>{__wake_up_common+62} <ffffffff8846e998>{:ptlrpc:ptlrpc_main+4664}
> Sep 10 14:12:36 uklust01 kernel:        <ffffffff8012af33>{default_wake_function+0} <ffffffff8010bdce>{child_rip+8}
> Sep 10 14:12:36 uklust01 kernel:        <ffffffff8846d760>{:ptlrpc:ptlrpc_main+0} <ffffffff8010bdc6>{child_rip+0}
> Sep 10 14:12:36 uklust01 kernel: LustreError: dumping log to /tmp/lustre-log.1252591955.5537
> 
> 
> _________________________________________________________________
> View your other email accounts from your Hotmail inbox. Add them now.
> http://clk.atdmt.com/UKM/go/167688463/direct/01/

> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list