[lustre-discuss] MGTMDT device getting full

Dilger, Andreas andreas.dilger at intel.com
Sun Oct 25 02:05:03 PDT 2015


On 2015/10/21, 05:40, "lustre-discuss on behalf of Torsten Harenberg"
<lustre-discuss-bounces at lists.lustre.org on behalf of
torsten.harenberg at cern.ch> wrote:

>Am 19.10.15 um 10:00 schrieb Torsten Harenberg:
>
>> [root at lustre1 MGTMDT]# ls -lh oi.16
>> -rw-r--r-- 1 root root 554G Aug 13  2013 oi.16
>
>coming back to this. I found this mail from Andreas
>
>https://lists.01.org/pipermail/hpdd-discuss/2014-October/001352.html
>
>"Multiple OI files are created for new filesystems by default.  You can
>also cause OI Scrub to rebuild the OI file if the current OI file is
>corrupted or missing."
>
>"You can mount the MDT locally as type ldiskfs, rename the oi.16 file,
>then
>mount the MDT as type lustre and OI scrub should begin automatically at
>mount (see "lctl get_param .
>"
>
>
>Does that mean that
>
>- going into downtime
>- backup the MDT
>- mount MDT as ldiskfs
>- rename (or move) oi.16
>- umount MDT as ldiskfs
>- mount all devices back as lustre
>- .. wait :)
>
>Would be a valid procedure to reduce the size used on the MDT?
>
>I triggered a OI_scrub with
>
>lctl > lfsck_start -M lustre-MDT0000 -r
>Started LFSCK on the device lustre-MDT0000.
>
>and this finished:
>
>name: OI_scrub
>magic: 0x4c5fd252
>oi_files: 1
>status: completed
>flags:
>param:
>time_since_last_completed: 5122 seconds
>time_since_latest_start: 12059 seconds
>time_since_last_checkpoint: 5122 seconds
>latest_start_position: 12
>last_checkpoint_position: 536870913
>first_failure_position: N/A
>checked: 69458966
>updated: 0
>failed: 0
>prior_updated: 0
>noscrub: 190241
>igif: 1719
>success_count: 2
>run_time: 6937 seconds
>average_speed: 10012 objects/sec
>real-time_speed: N/A
>current_position: N/A
>lf_scanned: 0
>lf_reparied: 0
>lf_failed: 0
>
>but not much space was freed.

If you just renamed the oi.16 file (which is reasonable for backup
purposes) then no space would be freed on the MDT, or free space might
even shrink because new OI files had to be created.  At least you should
be able to see that the new OI files are in total smaller than the old
one.  You could also move the old oi.16 file off the MDT to an external
filesystem, but that would take longer for both the initial fix, and also
if you needed to go back for some reason.

If you hadn't done anything to the original oi.16 file, then no space
savings is expected from just running LFSCK.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division




More information about the lustre-discuss mailing list