[lustre-discuss] How to remove an OST completely
    Angelos Ching 
    angelosching at clustertech.com
       
    Thu Mar  4 20:15:19 PST 2021
    
    
  
Hi TH,
I think you'll have to set max_create_count=20000 after step 7 unless 
you unmount and remount your MDT.
And for step 4, I used conf_param instead of set_param during my drill 
and I noticed this might be more resilient if you are using a HA pair 
for the MDT because the MDS might try to activate the inactive OST 
during failover as set_param is only changing run time option?
Regards,
Angelos
On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> Dear Hans,
>
> Thank you very much. Replacing the OST is new to me and very very
> useful. We will try it next time.
>
> So, according to the description of the manual, to replace the OST
> we probably need to:
>
> 1. Lock the old OST (e.g., chome-OST0008) such that it will not
>     create new files (run in the MDT server):
>
>     lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0
>
> 2. Locate the list of files in the old OST: (e.g., chome-OST0008):
>     (run in the client):
>
>     lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
>
> 3. Migrate the listed files in /tmp/OST0008.txt out of the old OST.
>     (run in the client).
>
> 4. Remove the old OST temporarily (run in the MDT server):
>
>     lctl set_param osc.chome-OST0008-osc-MDT0000.active=0
>
>     (Note: should use "set_param" instead of "conf_param")
>
> 5. Unmount the old OST partition (run in the old OST server)
>
> 6. Prepare the new OST for replacement by mkfs.lustre with --replace
>     option, and set the index to the old OST index (e.g., 0x8):
>     (run in the new OST server)
>
>     mkfs.lustre --ost --mgsnode=XXXXXX --index=0x8 --replace <device_name>
>
> 7. Mount the new OST (run in the new OST server).
>
>
> Best Regards,
>
> T.H.Hsieh
>
>
> On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via lustre-discuss wrote:
>> Hi,
>>
>> The manual describe this:
>>
>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>>
>> There is a note telling you that it will still be there, but can be
>> replaced.
>>
>> Hope you migrated your data away from the OST also. Otherwise you would
>> have lost it.
>>
>> Cheers,
>> Hans Henrik
>>
>> On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
>>> Dear All,
>>>
>>> Here is a question about how to remove an OST completely without
>>> restarting the Lustre file system. Our Lustre version is 2.12.6.
>>>
>>> We did the following steps to remove the OST:
>>>
>>> 1. Lock the OST (e.g., chome-OST0008) such that it will not create
>>>     new files (run in the MDT server):
>>>
>>>     lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0
>>>
>>> 2. Locate the list of files in the target OST: (e.g., chome-OST0008):
>>>     (run in the client):
>>>
>>>     lfs find --obd chome-OST0008_UUID /home
>>>
>>> 3. Remove OST (run in the MDT server):
>>>     lctl conf_param osc.chome-OST0008-osc-MDT0000.active=0
>>>
>>> 4. Unmount the OST partition (run in the OST server)
>>>
>>> After that, the total size of the Lustre file system decreased, and
>>> everything looks fine. However, without restarting (i.e., rebooting
>>> Lustre MDT / OST servers), we still feel that the removed OST is
>>> still exists. For example, in MDT:
>>>
>>> # lctl get_param osc.*.active
>>> osc.chome-OST0000-osc-MDT0000.active=1
>>> osc.chome-OST0001-osc-MDT0000.active=1
>>> osc.chome-OST0002-osc-MDT0000.active=1
>>> osc.chome-OST0003-osc-MDT0000.active=1
>>> osc.chome-OST0008-osc-MDT0000.active=0
>>> osc.chome-OST0010-osc-MDT0000.active=1
>>> osc.chome-OST0011-osc-MDT0000.active=1
>>> osc.chome-OST0012-osc-MDT0000.active=1
>>> osc.chome-OST0013-osc-MDT0000.active=1
>>> osc.chome-OST0014-osc-MDT0000.active=1
>>>
>>> We still see chome-OST0008. And in dmesg of MDT, we see a lot of:
>>>
>>> LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) chome-OST0008-osc-MDT0000:osp_attr_get update error [0x100080000:0x10a54c:0x0]: rc = -108
>>>
>>> In addition, when running LFSCK in the MDT server:
>>>
>>> 	lctl lfsck_start -A
>>>
>>> even after all the works of MDT and OST are completed, we still see that
>>> (run in MDT server):
>>>
>>> 	lctl get_param mdd.*.lfsck_layout
>>>
>>> the status is not completed:
>>>
>>> mdd.chome-MDT0000.lfsck_layout=
>>> name: lfsck_layout
>>> magic: 0xb1732fed
>>> version: 2
>>> status: partial
>>> flags: incomplete
>>> param: all_targets
>>> last_completed_time: 1614762495
>>> time_since_last_completed: 4325 seconds
>>> ....
>>>
>>> We suspect that the "incomplete" part might due to the already removed
>>> chome-OST0008.
>>>
>>> Is there any way to completely remove the chome-OST0008 from the Lustre
>>> file system ? since that OST device has already been reformatted for
>>> other usage.
>>>
>>> Thanks very much.
>>>
>>>
>>> T.H.Hsieh
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    
    
More information about the lustre-discuss
mailing list