[lustre-discuss] Replacing ldiskfs MDT with larger disk

Jesse Stroik jesse.stroik at ssec.wisc.edu
Mon Aug 5 09:49:31 PDT 2019


Ah, nevermind. It appears that this can be done if 'lfs migrate -m' is 
used directly instead of the lfs_migrate script.

Best,
Jesse

On 8/5/19 11:26 AM, Jesse Stroik wrote:
> 
> On 7/31/19 6:27 PM, Andreas Dilger wrote:
>> Just to clarify, when I referred to "file level backup/restore", I was 
>> referring to the MDT ldiskfs filesystem, not the whole Lustre 
>> filesystem (which would be _much_ too large for most sites.  The 
>> various backup/restore methods are documented in the Lustre Operations 
>> Manual.
> 
> 
> Yes - I sometimes copy file systems and I typically do so as cluster 
> jobs so I can adjust the rate of the copy. But that requires having a 
> spare petabytes available ;)
> 
> I created mdt --index=1 for DNE and ran into an issue. I had assumed lfs 
> setdirstripe works like lfs setstripe on existing directories so that 
> newly created files would be assigned to the new MDT.
> 
> However, setdirstripe is alias for lfs mkdir so I cannot change the MDT 
> setting on existing directories. I planned to change the MDT setting on 
> the directories and use lfs_migrate in the background to effect the 
> migration so it would be transparent to the end users.
> 
> Is there a better way to migrate use to the new MDT than recreating all 
> of the directories?
> 
> Jesse
> 
> 
> 
> 
>> Cheers, Andreas
>>
>>> On Jul 31, 2019, at 15:10, Jesse Stroik <jesse.stroik at ssec.wisc.edu> 
>>> wrote:
>>>
>>> This is excellent information, Andreas.
>>>
>>> Presently we do file level backups to the live file system and they 
>>> take over 24 hours, so they're done continuously. For that timeframe 
>>> to wrok, we'd need to be able to back up and recover the MDT to the 
>>> new MDT with the file system online.
>>>
>>> Given that resizing the file system will proportionately increase the 
>>> inodes (I didn't realize that), dd to a logical volume may be a 
>>> reasonable option for us. The dd would be faster enough that we could 
>>> weather the downtime.
>>>
>>> PFL and FLR aren't features they're planning for the file system and 
>>> it may be replaced next year so I suspect they'll opt for the DNE 
>>> method.
>>>
>>> Thanks again,
>>> Jesse Stroik
>>>
>>> On 7/31/19 3:11 PM, Andreas Dilger wrote:
>>>> Normally the easy answer would be that a "dd" copy of the MDT device 
>>>> from your HDDs to a larger SSD LUN, then resize2fs to increase the 
>>>> filesystem size would also increase the number of inodes 
>>>> proportionately to the LUN size.
>>>> However, since you are *not* using 1024-byte inode size, only 
>>>> 512-byte inode size + 512-bytes space for other things (ie. 1024 
>>>> bytes-per-inode ratio), I'd suggest a file-level MDT backup/restore 
>>>> to a newly-formatted MDT because newer features like PFL and FLR 
>>>> need more space in the inode itself. The benefit of this approach is 
>>>> that you keep a full backup of the MDT on the HDDs in case of 
>>>> problems.  Note that after backup/restore the LFSCK OI Scrub will 
>>>> run for some time (maybe an hour or two, depending on size), which 
>>>> will result in slowdown. That would likely be compensated by faster 
>>>> SSD storage.
>>>> If you go the DNE route, then migrate some of the namespace to the 
>>>> new MDT, you definitely still need to keep MDT0000.  However, you 
>>>> could combine these approaches and still copy MDT0000 to new flash 
>>>> storage instead of keeping the HDDs around forever.  I'd again 
>>>> recommend a file-level MDT backup/restore to a newly-formatted MDT 
>>>> to get the newer format options.
>>>> Cheers, Andreas
>>>>> On Jul 31, 2019, at 13:50, Jesse Stroik 
>>>>> <jesse.stroik at ssec.wisc.edu> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> One of our lustre file systems outgrew its MDT and the original 
>>>>> scope of its operation. This one is still running ldiskfs on the 
>>>>> MDT. Here's our setup and restrictions:
>>>>>
>>>>> - centos 6 / lustre 2.8
>>>>> - ldiskfs MDT
>>>>> - minimal downtime allowed, but the FS can be read-only for a while.
>>>>>
>>>>> The MDT itself, set up with -i 1024, needs both more space and 
>>>>> available inodes. Its purpose changed in scope and we'd now like 
>>>>> the performance benefits of getting off of spinning media as well.
>>>>>
>>>>> We need a new files system instead of expanding the existing 
>>>>> ldiskfs because we need more inodes.
>>>>>
>>>>> I think my options are (1) a file level backup and recovery or 
>>>>> direct copy onto the new file system or (2) add a new MDT to the 
>>>>> system and assign all directories under the root to it, then 
>>>>> lfs_migrate everything on the file system thereafter.
>>>>>
>>>>> Is there a disadvantage to the DNE approach other than the fact 
>>>>> that we have to keep the original spinning-disk MDT around to 
>>>>> service the root of the FS?
>>>>>
>>>>> If we had to do option 1, we'd want to remount the current MDT read 
>>>>> only and continue using it while we were preparing new MDT. When I 
>>>>> searched, I couldn't find anything that seemed definitive about 
>>>>> ensuring no changes to an ldiskfs MDT during operation and I don't 
>>>>> want to assume i can simply remount it read only.
>>>>>
>>>>> Thanks,
>>>>> Jesse Stroik
>>>>>
>>>>> _______________________________________________
>>>>> lustre-discuss mailing list
>>>>> lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>> Cheers, Andreas
>> -- 
>> Andreas Dilger
>> Principal Lustre Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3964 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190805/e8ccb35e/attachment-0001.bin>


More information about the lustre-discuss mailing list