[Lustre-discuss] Client complaining about duplicate inode entry after luster recovery

Bernd Schubert bs_lists at aakef.fastmail.fm
Sat Oct 10 07:18:04 PDT 2009


"ASSERTION(old_inode->i_state & I_FREEING)" is the infamous bug17485. You will 
need to run lfsck to fix it.


On Saturday 10 October 2009, Wojciech Turek wrote:
> Hi,
> 
> Did you get to the bottom of this?
> 
> We are having exactly the same problem with our lustre-1.6.6 (rhel4)  file
> systems. Recently it got worst and MDS crashes quite frequently, when we
>  run e2fsck there are errors that are being fixed. However after some time
>  we still are seeing  the same errors in the logs about missing objects and
>  files get corrupted (?-----------) Also clients LBUGs quite frequently
>  with this message (osc_request.c:2904:osc_set_data_with_check()) LBUG
> This looks like serious lustre problem but so far I didn't find any clues
>  on that even after long search through lustre bugzilla.
> 
> Our MDSs and OSSs are UPSed, RAID is behaving OK, we don't see any errors
>  in the syslog.
> 
> I will be grateful for some hints on this one
> 
> Wojciech
> 
> 2009/8/24 rishi pathak <mailmaverick666 at gmail.com>
> 
> > Hi,
> >
> > Our lustre fs comprises of 15 OST/OSS and 1 MDS with no failover. Client
> > as well as servers run lustre-1.6 and kernel 2.6.9-18.
> >
> >        Doing a ls -ltr for a directory in lustre fs throws following
> > errors (as got from lustre logs) on client
> >
> > 00000008:00020000:0:1251099455.304622:0:724:0:(osc_request.c:2898:osc_set
> >_data_with_check()) ### inconsistent l_ast_data found ns:
> > scratch-OST0005-osc-ffff81201e8dd800 lock: ffff811f9af04
> > 000/0xec0d1c36da6992fd lrc: 3/1,0 mode: PR/PR res: 570622/0 rrc: 2 type:
> > EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 100000
> > remote: 0xb79b445e381bc9e6 expref: -99 p
> > id: 22878
> > 00000008:00040000:0:1251099455.337868:0:724:0:(osc_request.c:2904:osc_set
> >_data_with_check()) ASSERTION(old_inode->i_state & I_FREEING) failed:Found
> > existing inode ffff811f2cf693b8/1972725
> > 44/1895600178 state 0 in lock: setting data to
> > ffff8118ef8ed5f8/207519777/1771835328
> > 00000000:00040000:0:1251099455.360090:0:724:0:(osc_request.c:2904:osc_set
> >_data_with_check()) LBUG
> >
> >
> > On scratch-OST0005 OST it shows
> >
> > Aug 24 10:22:53 yn266 kernel: LustreError:
> > 3023:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for
> > resour ce 569204: rc -2
> > Aug 24 10:22:53 yn266 kernel: LustreError:
> > 3023:0:(ldlm_resource.c:851:ldlm_resource_add()) Skipped 19 previous
> > similar messages
> > Aug 24 12:40:43 yn266 kernel: LustreError:
> > 2737:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for
> > resour ce 569195: rc -2
> > Aug 24 12:44:59 yn266 kernel: LustreError:
> > 2835:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for
> > resour ce 569198: rc -2
> >
> > These kind of errors we are getting for many clients.
> >
> > ##History ##
> > Prior to thsese occurences, our MDS showed signs of failure in way that
> > cpu load was shooting above 100 (on a quad core quad socket system) and
> > users were complaining about slow storage performance. We took it offline
> > and did fsck on unmounted MDS and OSTs. fsck on OSTs went fine but it
> > showed some errors which were fixed. For data integrity check, mdsdb and
> > ostdb were built and lfsck was run on a client(client was mounted with
> > abort_recov).
> >
> > lfsck was run in following order:
> > lfsck with no fix - reported dangling inodes and orphaned objects
> > lfsck with -l (backup orphaned objects)
> > lfsck with -d and -c (delete orphaned objects and create missing OST
> > objects referenced by MDS)
> >
> > After above operations, on clients we were seeing file in red and
> > blinking. Doing a stat came out with an error stating 'no such file or
> > directory'.
> >
> > My question is whether the order in which lfsck was run (should lfsck be
> > run multiple times) and  the errors we are getting are related or not.
> >
> >
> >
> >
> > --
> > Regards--
> > Rishi Pathak
> > National PARAM Supercomputing Facility
> > Center for Development of Advanced Computing(C-DAC)
> > Pune University Campus,Ganesh Khind Road
> > Pune-Maharastra
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


-- 
Bernd Schubert
DataDirect Networks



More information about the lustre-discuss mailing list