[lustre-discuss] Is there a way to have faster lustre file system checker (lfsck)?

Yong, Fan fan.yong at intel.com
Wed May 2 01:43:49 PDT 2018


Inline comments.

--
Cheers,
Nasf

> -----Original Message-----
> From: 代栋 [mailto:daidongly at gmail.com]
> Sent: Wednesday, May 2, 2018 4:06 PM
> To: Yong, Fan <fan.yong at intel.com>
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] Is there a way to have faster lustre file system
> checker (lfsck)?
> 
> Sorry, I misread “abnormal”.  Anything I can check to help diagnose the
> slowness?
> 
> Thanks,
> - Dong
> 
> > On May 2, 2018, at 2:53 AM, 代栋 <daidongly at gmail.com> wrote:
> >
> > Thanks very much for your reply.
> >
> > I used Lustre 2.9.0 and ran “lctl lfsck_start -M lustre-MDT0000 -A -t all -r” to
> start LFSCK.
> >
You can try "lctl lfsck_start -M lustre-MDT0000 -A -t namespace -r" firstly to check the namespace LFSCK speed. Since you has only one MDT, it should be quite faster. If yes, then check "lctl lfsck_start -M lustre-MDT0000 -A -t layout -r".

> > Could you brief me more about the slowness? I mean scanning around 300K
> inodes should not take that much time (80mins). These files were just created
> using a script after a fresh build of the lustre (no complex metadata operations
> at all).
> >
I do not know what caused such slowness. There may be many factors. Have you set some fail_loc? If not, you may need to enable LFSCK debug on both the MDT and OST, then collect and analysis Lustre debug logs.


> > Got it, so the 30-sec interval is just for checking the status of the MDT.
> Another question is, for layout checking, does lfsck need to compare metadata
> stored in MDT (in LayoutEA) and metadata stored in OSTs (FID in LMA? not very
> sure) for orphan objects? When are these metadata gathered into one place for
> checking? I am asking this because previously I thought the periodically queries
> from OSTs to MDT are doing this job.
> >
In short, all the MDT-object OST-object pairs have been marked during the 1st stage scanning. So If there are some OST-objects non-marked, then it may be orphans those will be handled during the 2nd phase scanning.


> > Thanks,
> > - Dong
> >
> >
> >> On May 1, 2018, at 9:49 PM, Yong, Fan <fan.yong at intel.com> wrote:
> >>
> >> Inline comments.
> >>
> >> --
> >> Cheers,
> >> Nasf
> >>
> >>
> >>> -----Original Message-----
> >>> From: lustre-discuss
> >>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 代栋
> >>> Sent: Wednesday, May 2, 2018 5:36 AM
> >>> To: lustre-discuss at lists.lustre.org
> >>> Subject: [lustre-discuss] Is there a way to have faster lustre file
> >>> system checker (lfsck)?
> >>>
> >>> Hi all,
> >>>
> >>> I am still new to Lustre, so please let me know if I should send
> >>> this message to devel-list.
> >>>
> >>> This week, I tried to run LFSCK over a very small cluster
> >>> configuration (1 mdt and
> >>> 3 osts).  In this Lustre, I used about 300K inodes.  It took me
> >>> about 80 mins to finish a LFSCK run.  And, more importantly, while I
> >>> am running LFSCK, on both MDT and OSTS, the CPU utilization is 100%,
> taken by the lfsck thread.
> >>
> >> Which version of Lustre and what is the LFSCK command line you used?
> >>
> >>
> >>> I understand that lfsck is operating in an online mode, so it is
> >>> slow.  But, I am wondering is there any way to accelerate this?
> >>> Especially if I am allowed to run it offline, for example, during weekly
> maintenance.
> >>
> >> Your slow is abnormal, not related with online. The LFSCK can NOT be run
> under offline mode.
> >>
> >>
> >>>
> >>> After checking the lfsck kernel logs, I noticed that in the phase2
> >>> scanning on OSTs, there is an 30 seconds interval between querying
> >>> the MDTs.  I am wondering is there any reason to have this 30
> >>> seconds interval, and will lfsck on OSTs be faster if we remove such 30
> seconds interval?
> >>
> >> Normally, the master engine on the MDT will notify the LFSCK engine on the
> OST when the first phase done. But we can NOT guarantee that the LFSCK
> engine on the MDT always alive during the LFSCK (may because of some failure,
> or network trouble, or node crash, and so on), so in the 2nd phase scanning, if
> the LFSCK engine on the OST does not receive the notification from the MDT, it
> needs to query the LFSCK (on the MDT) status periodically. If the MDT finished
> the 1st phase scanning earlier than OST, then there will be no such query.
> Anyway, such query is NOT the reason of your slow LFSCK.
> >>
> >>
> >>>
> >>> Thanks,
> >>> - Dong
> >>> _______________________________________________
> >>> lustre-discuss mailing list
> >>> lustre-discuss at lists.lustre.org
> >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >



More information about the lustre-discuss mailing list