[lustre-discuss] Is there a way to have faster lustre file system checker (lfsck)?

Yong, Fan fan.yong at intel.com
Wed May 2 23:33:08 PDT 2018


Inline comments.

--
Cheers,
Nasf
From: lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 代栋
Sent: Thursday, May 3, 2018 1:22 AM
To: Yong, Fan <fan.yong at intel.com>
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] Is there a way to have faster lustre file system checker (lfsck)?

Many thanks. I will try to run namespace check again and see the results. BTW, is the rate control mechanism enabled by default? Can I disable it?
[Nasf] The rate control is disabled by default. You can check current rate control via “lctl get_param mdd.*.lfsck_speed_limit” on the MDT and/or “lctl get_param obdfilter.*.speed_limit” on the OSTs.

I might be asking stupid questions. But, how MDT-object and OST-object pair each other during phase1 scanning? I mean whether OST sends objects metadata to MDT or MDT sends objects metadata to OST for pairing? Could you point me to the source code, so that I can look for more details?
[Nasf] The LFSCK engine on the MDT will read all the (MDT) known OST-objects’ PFID EA from the OSTs, then verify them. Please check lfsck_layout_assistant_handler_p1 for detail.

thanks,
- Dong

On May 2, 2018, at 3:43 AM, Yong, Fan <fan.yong at intel.com<mailto:fan.yong at intel.com>> wrote:

Inline comments.

--
Cheers,
Nasf


-----Original Message-----
From: 代栋 [mailto:daidongly at gmail.com]
Sent: Wednesday, May 2, 2018 4:06 PM
To: Yong, Fan <fan.yong at intel.com<mailto:fan.yong at intel.com>>
Cc: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Is there a way to have faster lustre file system
checker (lfsck)?

Sorry, I misread “abnormal”.  Anything I can check to help diagnose the
slowness?

Thanks,
- Dong


On May 2, 2018, at 2:53 AM, 代栋 <daidongly at gmail.com<mailto:daidongly at gmail.com>> wrote:

Thanks very much for your reply.

I used Lustre 2.9.0 and ran “lctl lfsck_start -M lustre-MDT0000 -A -t all -r” to
start LFSCK.


You can try "lctl lfsck_start -M lustre-MDT0000 -A -t namespace -r" firstly to check the namespace LFSCK speed. Since you has only one MDT, it should be quite faster. If yes, then check "lctl lfsck_start -M lustre-MDT0000 -A -t layout -r".


Could you brief me more about the slowness? I mean scanning around 300K
inodes should not take that much time (80mins). These files were just created
using a script after a fresh build of the lustre (no complex metadata operations
at all).


I do not know what caused such slowness. There may be many factors. Have you set some fail_loc? If not, you may need to enable LFSCK debug on both the MDT and OST, then collect and analysis Lustre debug logs.



Got it, so the 30-sec interval is just for checking the status of the MDT.
Another question is, for layout checking, does lfsck need to compare metadata
stored in MDT (in LayoutEA) and metadata stored in OSTs (FID in LMA? not very
sure) for orphan objects? When are these metadata gathered into one place for
checking? I am asking this because previously I thought the periodically queries
from OSTs to MDT are doing this job.


In short, all the MDT-object OST-object pairs have been marked during the 1st stage scanning. So If there are some OST-objects non-marked, then it may be orphans those will be handled during the 2nd phase scanning.



Thanks,
- Dong



On May 1, 2018, at 9:49 PM, Yong, Fan <fan.yong at intel.com<mailto:fan.yong at intel.com>> wrote:

Inline comments.

--
Cheers,
Nasf



-----Original Message-----
From: lustre-discuss
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 代栋
Sent: Wednesday, May 2, 2018 5:36 AM
To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Is there a way to have faster lustre file
system checker (lfsck)?

Hi all,

I am still new to Lustre, so please let me know if I should send
this message to devel-list.

This week, I tried to run LFSCK over a very small cluster
configuration (1 mdt and
3 osts).  In this Lustre, I used about 300K inodes.  It took me
about 80 mins to finish a LFSCK run.  And, more importantly, while I
am running LFSCK, on both MDT and OSTS, the CPU utilization is 100%,
taken by the lfsck thread.


Which version of Lustre and what is the LFSCK command line you used?



I understand that lfsck is operating in an online mode, so it is
slow.  But, I am wondering is there any way to accelerate this?
Especially if I am allowed to run it offline, for example, during weekly
maintenance.


Your slow is abnormal, not related with online. The LFSCK can NOT be run
under offline mode.





After checking the lfsck kernel logs, I noticed that in the phase2
scanning on OSTs, there is an 30 seconds interval between querying
the MDTs.  I am wondering is there any reason to have this 30
seconds interval, and will lfsck on OSTs be faster if we remove such 30
seconds interval?


Normally, the master engine on the MDT will notify the LFSCK engine on the
OST when the first phase done. But we can NOT guarantee that the LFSCK
engine on the MDT always alive during the LFSCK (may because of some failure,
or network trouble, or node crash, and so on), so in the 2nd phase scanning, if
the LFSCK engine on the OST does not receive the notification from the MDT, it
needs to query the LFSCK (on the MDT) status periodically. If the MDT finished
the 1st phase scanning earlier than OST, then there will be no such query.
Anyway, such query is NOT the reason of your slow LFSCK.





Thanks,
- Dong
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180503/4e0bd30c/attachment-0001.html>


More information about the lustre-discuss mailing list