[Lustre-discuss] file system instability after fsck and lfsck

Tue Oct 27 03:57:07 PDT 2009

Hi,

I should also point out that there is a bug in lfsck which causes lfsck to
fail if one uses relative paths to the lustre targets dbs (there is a lustre
bugzilla ticket but I don't remember the number for it). To avoid this bug
one should use absolute paths. Below is procedure I used to run lfsck. Make
sure that before creating fsck databases you fix all outstanding issues on
all the OSTs and MDT using normal fsck (using for example this command line:
fsck -f -v /dev/<lustre_target> -C0)

--
#Run lfsck on Lustre /home filesystem (4OST+1MDT)
#
#Create mdsdb database file for /home MDT
# Run on mds01
e2fsck -n -v --mdsdb /nfs/lfsck/mds_home_db /dev/dm-0

# For each home OST create ostdb file
# oss01
e2fsck -n -v --mdsdb /nfs/lfsck/mds_home_db --ostdb /nfs/lfsck/home_ost00db
/dev/dm-0
# oss02
e2fsck -n -v --mdsdb /nfs/lfsck/mds_home_db --ostdb /nfs/lfsck/home_ost01db
/dev/dm-6
# oss03
e2fsck -n -v --mdsdb /nfs/lfsck/mds_home_db --ostdb /nfs/lfsck/home_ost02db
/dev/dm-12
# oss04
e2fsck -n -v --mdsdb /nfs/lfsck/mds_home_db --ostdb /nfs/lfsck/home_ost03db
/dev/dm-18

# On Lustre client mount lustre (under /home) and run this command (first
with -n to see how much problems there is to fix)
lfsck -n -v --mdsdb /nfs/lfsck/mds_home_db --ostdb /nfs/lfsck/home_ost00db
/nfs/lfsck/home_ost01db /nfs/lfsck/home_ost02db /nfs/lfsck/home_ost03db
/home
--
You need to repeat whole procedure for each lfsck run.

I hope it helps,

Wojciech

2009/10/27 rishi pathak <mailmaverick666 at gmail.com>

> HI,
>          We ran into the same problem due to power failure. We followed the
> the below mentioned steps until no errors were reported by lfsck:
> 1.Run e2fsck on MDS and OSS
> 2.Build lustre DB
> 3.run lfsck
> We used to delete orphaned objects and dangling ionodes.
> Make sure you recreate lustre DB after every fsck run and also dont use the
> same lustre DB for more than one operation using lfsck.
> Hope this will help
>
>
> On Tue, Oct 27, 2009 at 12:00 AM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
>
>> Hi,
>>
>> I had similar problem just three weeks ago on our Lustre 1.6.6 RHEL4.
>> It all started with several "lvbo_init failed for resource" messages
>> appearing in the syslogs every night.
>> As far as I know it means that you have files with missing objects.
>> This message occurs when some one tries to access file which MDS can not
>> find objects for on an OST
>>
>> I have run fsck on each OST and MDT to repair (unlink) files with missing
>> objects. However every night I were getting new files with missing objects.
>> Another runs of fsck on MDT found these errors and fixed them. At this
>> point I was very interested in finding out what was causing this corruption.
>> Users reported that files that ware recently created and were perfectly fine
>> have gone missing and there was only a file name with ???? left behind(but
>> no actual file and data).
>>
>> To further investigate the problem I ran rsync in dry mode on the lustre
>> filesystem to expose newly corrupted files (rsync in dry mode does stat on
>> each file and this is enough to trigger lvbo error thus find corrupted
>> file). I ran rsync every 12 hours and each run revealed newly corrupted
>> files (lvbo errors).
>>
>> At this point accessing corrupted file didn't cause an LBUG. However after
>> couple of days corruption developed to the point where I got an LBUG caused
>> by:
>> osc_set_data_with_check()) ASSERTION(old_inode->i_state & I_FREEING)
>> failed: Found existing inode.
>>
>>>
>> This error is described in Lustre bugzilla.
>> https://bugzilla.lustre.org/show_bug.cgi?id=17485
>>
>> In order to fix this problem you need to stop your file system. Run normal
>> fsck on all OSTs and MDT and repair all errors. Once all your lustre targets
>> are in consistent state you need to run lfsck. The detail information on how
>> to run lfsck is in the Lustre 1.6 manual. Make sure that you have latest
>> e2fstools installed (i believe for rhel4 this is
>> e2fsprogs-1.40.11.sun1-0redhat.x86_64)
>>
>> If after lfsck your lustre still contains corrupted files and you can
>> still find new messages "lvbo_init failed for resource" I believe that your
>> MDT is located on faulty hardware.
>>
>> In my case after file system check I could still observe new occurrences
>> of files with missing objects (lvbo errors when running rsync in dry mode).
>> RAID device used by MDT was not showing any signs of problems. However at
>> that time I could not find any other explanation for these errors than a
>> silent hardware corruption, so I decided to replace the MDT hardware. I
>> quickly configured DRBD on my MDS servers using local server disks and i
>> transferred MDT to this DRBD device. Since that time all my problems with
>> corrupted files are gone. Since two weeks I didn't  find a single corrupted
>> file on two of my file systems (affected by above described corruption)
>> which are now continuously busy (accessed by 600 clients)
>>
>> I hope it helps
>>
>> Wojciech
>>
>> P.S. Out of my curiosity can you tell me what sort of hardware you have
>> your MDT?
>>
>> 2009/10/26 Dan <dan at nerp.net>
>>
>> Hi all,
>>>
>>> I'm running Lustre 1.6.7.2 on RHEL 4. I ran fsck and lfsck because of
>>> several hard shutdowns due to power fails in the server room.  Prior to
>>> the repairs I was getting a few of the ASSERTION errors listed below on
>>> some clients when certain files were accessed.  This almost always locks
>>> the client.  How can I find these "bad" files?  Even running ls can lock
>>> a client.  Unsurprisingly running ps indicates ls is hanging with status
>>> D or D+.
>>>
>>> After repairs 51,833 files were found orphaned and in /lustre/lost
>>> +found.  Also, lfsck reported 414,000 duplicate files when run with -n.
>>> I stopped lfsck when creating the duplicates in lost+found/duplicates
>>> since I didn't have enough space on the FS to create them all!
>>>
>>> Some users started reporting that when files are created sometimes they
>>> appear w/o any data.  All permissions, size owner info is all ????.
>>> Many other files are created and access successfully.  Existing files
>>> can be read ok.  The filesystem is currently unusable because nearly all
>>> jobs hang the client, how do I fix this?
>>>
>>>
>>> I typically get this error on clients:
>>>
>>> Oct 20 16:11:49 node05 kernel:
>>> LustreError:26409:0:(osc_request.c:2974:osc_set_data_with_check())
>>> ASSERTION(old_inode->i_state & I_FREEING) failed: Found existing inode
>>> 0000010051c5a278/6590499/4091507727 state 1 in lock: setting data to
>>> 0000010051c5acf8/13570878/674587622
>>>
>>> LustreError: 5842:0:(lib-move.c:110:lnet_try_match_md()) Matching packet
>>> from 12345-192.168.0.27 at tcp, match 162853 length 1456 too big: 1360
>>> allowed
>>> Lustre: Request x162853 sent from
>>> filesystem-MDT0000-mdc-000001007dc6d400 to NID 192.168.0.27 at tcp 100s ago
>>> has timed out (lmit 100s).
>>> Lustre: filesystem-MDT0000-mdc000001007dc6d400: Connection to service
>>> filesystem-MDT0000 via nid 192.168.0.26 at tcp was lost; in progress
>>> operations using this service will wait for recovery to compelete.
>>>
>>>
>>> I see a lot of this on the OSSs:
>>>
>>>
>>>
>>> Oct 20 16:25:44 OSS2 kernel: Lustre Error:
>>> 7857:0:(osc_request.c:2898:osc_set_data_with_check()) ### inconsistent
>>> l_ast_data found ns: oss2-OST0004-osc-----1041da90c00 lock:
>>> 00000103ef8b1040/0x205ca9465c341c75 lrc:3/1,0 mode PR/PR res: 2/0 rrc:2
>>> type: EXT [0->18446744073709551615] (req 0-> 18446744073709551615)
>>> flags: 100000 remote:oxdcf241da6ca3e60a expref: -99 pid:5289
>>>
>>> Oct 20 16:26:22 OSS2 kernel: LustreError:
>>> 4991:0:(ldlm_resource.c:851:ldlm_resource_add()) lvbo_init failed for
>>> resource 482767: rc -2
>>>
>>>
>>> Thank you,
>>>
>>> Dan
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>
>>
>>
>> --
>> --
>> Wojciech Turek
>>
>> Assistant System Manager
>>
>> High Performance Computing Service
>> University of Cambridge
>> Email: wjt27 at cam.ac.uk
>> Tel: (+)44 1223 763517
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>
> --
> Regards--
> Rishi Pathak
> National PARAM Supercomputing Facility
> Center for Development of Advanced Computing(C-DAC)
> Pune University Campus,Ganesh Khind Road
> Pune-Maharastra
>

-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091027/543093bb/attachment.htm>