[lustre-discuss] df shows 32x the expected filesystem size / LFSCK duration?

Mon Jul 3 09:50:48 PDT 2023

Hello,

we are running Lustre 2.12.5 on ZFS 0.8.3, where the MDT is backed by a ZFS pool consisting of 8 mirror vdevs.  Recently we had two drives fail in the same mirror vdev, from which we were able to mostly recover (details on the zfs-discuss list [1]).  We have brought the Lustre filesystem back up and mounted it on a single client read-only.  Directly after mount, checking the filesystem space with "df" shows the correct/expected size of the fs.  But running it again shows the total/used/available space inflated by a factor of 32 (almost exactly):

# mount /lustreTEST; sleep 1; df /lustreTEST
Filesystem                                            1K-blocks         Used    Available Use% Mounted on
10.128.104.102 at o2ib:10.128.104.101 at o2ib:/lustrefs 1131965562880 527126408192 604839130112  47% /lustreTEST

# df /lustreTEST
Filesystem                                             1K-blocks           Used      Available Use% Mounted on
10.128.104.102 at o2ib:10.128.104.101 at o2ib:/lustrefs 36222898339840 16868045127680 19354852425728  47% /lustreTEST

For comparison, "lfs df" shows the expected size (1.1PB):

# lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
lustrefs-MDT0000_UUID  2236427008  1183398912  1053026048  53% /lustreTEST[MDT:0]
lustrefs-OST0000_UUID 94330528768 42854200320 51476326400  46% /lustreTEST[OST:0]
lustrefs-OST0001_UUID 94329859072 46121248768 48208608256  49% /lustreTEST[OST:1]
lustrefs-OST0002_UUID 94330471424 44617906176 49712563200  48% /lustreTEST[OST:2]
lustrefs-OST0003_UUID 94330831872 42602903552 51727926272  46% /lustreTEST[OST:3]
lustrefs-OST0004_UUID 94330233856 42467627008 51862604800  46% /lustreTEST[OST:4]
lustrefs-OST0005_UUID 94330768384 47377309696 46953456640  51% /lustreTEST[OST:5]
lustrefs-OST0006_UUID 94330462208 42833333248 51497126912  46% /lustreTEST[OST:6]
lustrefs-OST0007_UUID 94330291200 42838326272 51491962880  46% /lustreTEST[OST:7]
lustrefs-OST0008_UUID 94330214400 45438008320 48892204032  49% /lustreTEST[OST:8]
lustrefs-OST0009_UUID 94330670080 44649268224 49681399808  48% /lustreTEST[OST:9]
lustrefs-OST000a_UUID 94330542080 43913293824 50417246208  47% /lustreTEST[OST:10]
lustrefs-OST000b_UUID 94330704896 41412985856 52917716992  44% /lustreTEST[OST:11]

filesystem_summary:  1131965578240 527126411264 604839142400  47% /lustreTEST

We are wondering why the df output is so inflated, and how worried we should be about it. For now we've refrained from putting the fs back into production.

Anyway, we are running LFSCK now. Is there a way to estimate how long it will take? I can see progress in the "checked" and "current_position" counters reported by "lctl get_param osd-zfs.lustrefs-MDT*.oi_scrub / mdd.lustrefs-MDT*.lfsck_layout / mdd.lustrefs-MDT*.lfsck_namespace" but how could we estimate what the 100% values of those counters are?  Is it related to how full the MDT is?  We seem to have 264M inodes in use.

Many thanks,
Frank

[1] https://zfsonlinux.topicbox.com/groups/zfs-discuss/T28e59deb8ff2c26d/two-failed-drives-in-mirror-vdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230703/200230ed/attachment.htm>