[Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND
Karen M. Fernsler
fernsler at ncsa.uiuc.edu
Thu Mar 13 13:51:22 PDT 2008
2.6.9-42.0.10.EL_lustre-1.4.10.1smp
This is a 2.6.9-42.0.10.E kernel with lustre-1.4.10.1.
This has been working ok for almost a year. We did try to
export this filesystem to another cluster over nfs before
we started seeing problems, but I don't know how related if
at all that is.
We are now trying to dissect the problem by inspecting
the switch logs these nodes are connected to.
thanks,
-k
On Thu, Mar 13, 2008 at 04:50:04PM -0400, Aaron Knister wrote:
> What version of lustre/kernel is running on the problematic server?
>
> On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote:
>
> >We got past that point by e2fsck the individual partitions first.
> >
> >But we are still having problems.. I'm sorry to
> >say. we have an I/O server that is fine until
> >we start Lustre. It starts spewing lustre call traces :
> >
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> > <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_232 S 000001037d6bbee8 0 26764 1 26765
> >26763 (L-TLB)
> >000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003
> > 0000000000000016 0000000000000001 00000104100bcb20
> >0000000300000246
> > 00000103f5470030 000000000001d381
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> > <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_233 S 00000103de847ee8 0 26765 1 26766
> >26764 (L-TLB)
> >00000103de847e58 0000000000000046 0000000100000246 0000000000000001
> > 0000000000000016 0000000000000001 000001040f83c620
> >0000000100000246
> > 00000103e627e030 000000000001d487
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> > <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_234 S 00000100c4353ee8 0 26766 1 26767
> >26765 (L-TLB)
> >00000100c4353e58 0000000000000046 0000000100000246 0000000000000003
> > 0000000000000016 0000000000000001 00000104100bcc60
> >0000000300000246
> > 00000103de81b810 000000000001d945
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> > <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retr�f���c���c��
> >
> >Ks[F����
> ><ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> > <ffffffffa03e0156>{:ptl
> >
> >It then panic's the kernel.. ??
> >
> >Michelle Butler
> >
> >At 02:39 AM 3/13/2008, Andreas Dilger wrote:
> >>On Mar 12, 2008 06:44 -0500, Karen M. Fernsler wrote:
> >>>I'm running:
> >>>
> >>>e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4
> >>>
> >>>and getting:
> >>>
> >>>Pass 6: Acquiring information for lfsck
> >>>error getting mds_hdr (3685469441:8) in
> >>/post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found
> >>>e2fsck: aborted
> >>>
> >>>Any ideas how to get around this?
> >>
> >>Does "mdsdb" actually exist? This should be created by first
> >>running:
> >>
> >>e2fsck --mdsdb mdsdb /dev/{mdsdevicename}
> >>
> >>before running your above command on the OST.
> >>
> >>Please also try specifying the absolute pathname for the mdsdb and
> >>ostdb
> >>files.
> >>
> >>Cheers, Andreas
> >>--
> >>Andreas Dilger
> >>Sr. Staff Engineer, Lustre Group
> >>Sun Microsystems of Canada, Inc.
> >
> >
> >_______________________________________________
> >Lustre-discuss mailing list
> >Lustre-discuss at lists.lustre.org
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
--
Karen Fernsler Systems Engineer
National Center for Supercomputing Applications
ph: (217) 265 5249
email: fernsler at ncsa.uiuc.edu
More information about the lustre-discuss
mailing list