[lustre-discuss] Is there a way to have faster lustre file system checker (lfsck)?

代栋 daidongly at gmail.com
Wed May 2 10:21:42 PDT 2018


Many thanks. I will try to run namespace check again and see the results. BTW, is the rate control mechanism enabled by default? Can I disable it?

I might be asking stupid questions. But, how MDT-object and OST-object pair each other during phase1 scanning? I mean whether OST sends objects metadata to MDT or MDT sends objects metadata to OST for pairing? Could you point me to the source code, so that I can look for more details?

thanks,
- Dong

> On May 2, 2018, at 3:43 AM, Yong, Fan <fan.yong at intel.com> wrote:
> 
> Inline comments.
> 
> --
> Cheers,
> Nasf
> 
>> -----Original Message-----
>> From: 代栋 [mailto:daidongly at gmail.com <mailto:daidongly at gmail.com>]
>> Sent: Wednesday, May 2, 2018 4:06 PM
>> To: Yong, Fan <fan.yong at intel.com <mailto:fan.yong at intel.com>>
>> Cc: lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>> Subject: Re: [lustre-discuss] Is there a way to have faster lustre file system
>> checker (lfsck)?
>> 
>> Sorry, I misread “abnormal”.  Anything I can check to help diagnose the
>> slowness?
>> 
>> Thanks,
>> - Dong
>> 
>>> On May 2, 2018, at 2:53 AM, 代栋 <daidongly at gmail.com> wrote:
>>> 
>>> Thanks very much for your reply.
>>> 
>>> I used Lustre 2.9.0 and ran “lctl lfsck_start -M lustre-MDT0000 -A -t all -r” to
>> start LFSCK.
>>> 
> You can try "lctl lfsck_start -M lustre-MDT0000 -A -t namespace -r" firstly to check the namespace LFSCK speed. Since you has only one MDT, it should be quite faster. If yes, then check "lctl lfsck_start -M lustre-MDT0000 -A -t layout -r".
> 
>>> Could you brief me more about the slowness? I mean scanning around 300K
>> inodes should not take that much time (80mins). These files were just created
>> using a script after a fresh build of the lustre (no complex metadata operations
>> at all).
>>> 
> I do not know what caused such slowness. There may be many factors. Have you set some fail_loc? If not, you may need to enable LFSCK debug on both the MDT and OST, then collect and analysis Lustre debug logs.
> 
> 
>>> Got it, so the 30-sec interval is just for checking the status of the MDT.
>> Another question is, for layout checking, does lfsck need to compare metadata
>> stored in MDT (in LayoutEA) and metadata stored in OSTs (FID in LMA? not very
>> sure) for orphan objects? When are these metadata gathered into one place for
>> checking? I am asking this because previously I thought the periodically queries
>> from OSTs to MDT are doing this job.
>>> 
> In short, all the MDT-object OST-object pairs have been marked during the 1st stage scanning. So If there are some OST-objects non-marked, then it may be orphans those will be handled during the 2nd phase scanning.
> 
> 
>>> Thanks,
>>> - Dong
>>> 
>>> 
>>>> On May 1, 2018, at 9:49 PM, Yong, Fan <fan.yong at intel.com> wrote:
>>>> 
>>>> Inline comments.
>>>> 
>>>> --
>>>> Cheers,
>>>> Nasf
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: lustre-discuss
>>>>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 代栋
>>>>> Sent: Wednesday, May 2, 2018 5:36 AM
>>>>> To: lustre-discuss at lists.lustre.org
>>>>> Subject: [lustre-discuss] Is there a way to have faster lustre file
>>>>> system checker (lfsck)?
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am still new to Lustre, so please let me know if I should send
>>>>> this message to devel-list.
>>>>> 
>>>>> This week, I tried to run LFSCK over a very small cluster
>>>>> configuration (1 mdt and
>>>>> 3 osts).  In this Lustre, I used about 300K inodes.  It took me
>>>>> about 80 mins to finish a LFSCK run.  And, more importantly, while I
>>>>> am running LFSCK, on both MDT and OSTS, the CPU utilization is 100%,
>> taken by the lfsck thread.
>>>> 
>>>> Which version of Lustre and what is the LFSCK command line you used?
>>>> 
>>>> 
>>>>> I understand that lfsck is operating in an online mode, so it is
>>>>> slow.  But, I am wondering is there any way to accelerate this?
>>>>> Especially if I am allowed to run it offline, for example, during weekly
>> maintenance.
>>>> 
>>>> Your slow is abnormal, not related with online. The LFSCK can NOT be run
>> under offline mode.
>>>> 
>>>> 
>>>>> 
>>>>> After checking the lfsck kernel logs, I noticed that in the phase2
>>>>> scanning on OSTs, there is an 30 seconds interval between querying
>>>>> the MDTs.  I am wondering is there any reason to have this 30
>>>>> seconds interval, and will lfsck on OSTs be faster if we remove such 30
>> seconds interval?
>>>> 
>>>> Normally, the master engine on the MDT will notify the LFSCK engine on the
>> OST when the first phase done. But we can NOT guarantee that the LFSCK
>> engine on the MDT always alive during the LFSCK (may because of some failure,
>> or network trouble, or node crash, and so on), so in the 2nd phase scanning, if
>> the LFSCK engine on the OST does not receive the notification from the MDT, it
>> needs to query the LFSCK (on the MDT) status periodically. If the MDT finished
>> the 1st phase scanning earlier than OST, then there will be no such query.
>> Anyway, such query is NOT the reason of your slow LFSCK.
>>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> - Dong
>>>>> _______________________________________________
>>>>> lustre-discuss mailing list
>>>>> lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180502/627a8fae/attachment-0001.html>


More information about the lustre-discuss mailing list