[Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND

Karen M. Fernsler fernsler at ncsa.uiuc.edu
Thu Mar 13 13:51:22 PDT 2008


2.6.9-42.0.10.EL_lustre-1.4.10.1smp

This is a 2.6.9-42.0.10.E kernel with lustre-1.4.10.1.

This has been working ok for almost a year.  We did try to
export this filesystem to another cluster over nfs before
we started seeing problems, but I don't know how related if
at all that is.

We are now trying to dissect the problem by inspecting
the switch logs these nodes are connected to.

thanks,
-k

On Thu, Mar 13, 2008 at 04:50:04PM -0400, Aaron Knister wrote:
> What version of lustre/kernel is running on the problematic server?
> 
> On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote:
> 
> >We got past that point by e2fsck the individual partitions first.
> >
> >But we are still having problems.. I'm sorry to
> >say.   we have an I/O server that is fine until
> >we start Lustre.  It starts spewing lustre call traces :
> >
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> >       <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_232 S 000001037d6bbee8     0 26764      1         26765  
> >26763 (L-TLB)
> >000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003
> >       0000000000000016 0000000000000001 00000104100bcb20  
> >0000000300000246
> >       00000103f5470030 000000000001d381
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> >       <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_233 S 00000103de847ee8     0 26765      1         26766  
> >26764 (L-TLB)
> >00000103de847e58 0000000000000046 0000000100000246 0000000000000001
> >       0000000000000016 0000000000000001 000001040f83c620  
> >0000000100000246
> >       00000103e627e030 000000000001d487
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> >       <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >       <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> ><ffffffff80110ebb>{child_rip+8}
> >       <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0}
> ><ffffffff80110eb3>{child_rip+0}
> >
> >ll_ost_io_234 S 00000100c4353ee8     0 26766      1         26767  
> >26765 (L-TLB)
> >00000100c4353e58 0000000000000046 0000000100000246 0000000000000003
> >       0000000000000016 0000000000000001 00000104100bcc60  
> >0000000300000246
> >       00000103de81b810 000000000001d945
> >Call
> >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22}
> ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408}
> >       <ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >        
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retr�f���c���c��
> >                                                          
> >Ks[F����
> ><ffffffff8013327d>{default_wake_function+0}
> ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0}
> >       <ffffffffa03e0156>{:ptl
> >
> >It then panic's the kernel.. ??
> >
> >Michelle Butler
> >
> >At 02:39 AM 3/13/2008, Andreas Dilger wrote:
> >>On Mar 12, 2008  06:44 -0500, Karen M. Fernsler wrote:
> >>>I'm running:
> >>>
> >>>e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4
> >>>
> >>>and getting:
> >>>
> >>>Pass 6: Acquiring information for lfsck
> >>>error getting mds_hdr (3685469441:8) in
> >>/post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found
> >>>e2fsck: aborted
> >>>
> >>>Any ideas how to get around this?
> >>
> >>Does "mdsdb" actually exist?  This should be created by first  
> >>running:
> >>
> >>e2fsck --mdsdb mdsdb /dev/{mdsdevicename}
> >>
> >>before running your above command on the OST.
> >>
> >>Please also try specifying the absolute pathname for the mdsdb and  
> >>ostdb
> >>files.
> >>
> >>Cheers, Andreas
> >>--
> >>Andreas Dilger
> >>Sr. Staff Engineer, Lustre Group
> >>Sun Microsystems of Canada, Inc.
> >
> >
> >_______________________________________________
> >Lustre-discuss mailing list
> >Lustre-discuss at lists.lustre.org
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
> 
> (301) 595-7000
> aaron at iges.org
> 
> 
> 

-- 
Karen Fernsler Systems Engineer
National Center for Supercomputing Applications
ph: (217) 265 5249
email: fernsler at ncsa.uiuc.edu



More information about the lustre-discuss mailing list