[lustre-discuss] How to eliminate zombie OSTs
Horn, Chris
chris.horn at hpe.com
Wed Aug 9 10:07:57 PDT 2023
The error message is stating that ‘-P’ is not valid option to the conf_param command. You may be thinking of lctl set_param -P …
Did you follow the documented procedure for removing an OST from the filesystem when you “adjust[ed] the configuration”?
https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustremaint.remove_ost
Chris Horn
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Alejandro Sierra via lustre-discuss <lustre-discuss at lists.lustre.org>
Date: Wednesday, August 9, 2023 at 11:55 AM
To: Jeff Johnson <jeff.johnson at aeoncomputing.com>
Cc: lustre-discuss <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] How to eliminate zombie OSTs
Yes, it is.
El mié, 9 ago 2023 a la(s) 10:49, Jeff Johnson
(jeff.johnson at aeoncomputing.com) escribió:
>
> Alejandro,
>
> Is your MGS located on the same node as your primary MDT? (combined MGS/MDT node)
>
> --Jeff
>
> On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
>>
>> Hello,
>>
>> In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS
>> with 4 jboxes, each box with 24 disks of 12 TB each, for a total of
>> nearly 1 PB. In all that time we had power failures and failed raid
>> controller cards, all of which made us adjust the configuration. After
>> the last failure, the system keeps sending error messages about OSTs
>> that are no more in the system. In the MDS I do
>>
>> # lctl dl
>>
>> and I get the 20 currently active OSTs
>>
>> oss01.lanot.unam.mx - OST00 /dev/disk/by-label/lustre-OST0000
>> oss01.lanot.unam.mx - OST01 /dev/disk/by-label/lustre-OST0001
>> oss01.lanot.unam.mx - OST02 /dev/disk/by-label/lustre-OST0002
>> oss01.lanot.unam.mx - OST03 /dev/disk/by-label/lustre-OST0003
>> oss01.lanot.unam.mx - OST04 /dev/disk/by-label/lustre-OST0004
>> oss01.lanot.unam.mx - OST05 /dev/disk/by-label/lustre-OST0005
>> oss01.lanot.unam.mx - OST06 /dev/disk/by-label/lustre-OST0006
>> oss01.lanot.unam.mx - OST07 /dev/disk/by-label/lustre-OST0007
>> oss01.lanot.unam.mx - OST08 /dev/disk/by-label/lustre-OST0008
>> oss01.lanot.unam.mx - OST09 /dev/disk/by-label/lustre-OST0009
>> oss02.lanot.unam.mx - OST15 /dev/disk/by-label/lustre-OST000f
>> oss02.lanot.unam.mx - OST16 /dev/disk/by-label/lustre-OST0010
>> oss02.lanot.unam.mx - OST17 /dev/disk/by-label/lustre-OST0011
>> oss02.lanot.unam.mx - OST18 /dev/disk/by-label/lustre-OST0012
>> oss02.lanot.unam.mx - OST19 /dev/disk/by-label/lustre-OST0013
>> oss02.lanot.unam.mx - OST25 /dev/disk/by-label/lustre-OST0019
>> oss02.lanot.unam.mx - OST26 /dev/disk/by-label/lustre-OST001a
>> oss02.lanot.unam.mx - OST27 /dev/disk/by-label/lustre-OST001b
>> oss02.lanot.unam.mx - OST28 /dev/disk/by-label/lustre-OST001c
>> oss02.lanot.unam.mx - OST29 /dev/disk/by-label/lustre-OST001d
>>
>> but I also get 5 that are not currently active, in fact doesn't exist
>>
>> 28 IN osp lustre-OST0014-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>> 29 UP osp lustre-OST0015-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>> 30 UP osp lustre-OST0016-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>> 31 UP osp lustre-OST0017-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>> 32 UP osp lustre-OST0018-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>>
>> When I try to eliminate them with
>>
>> lctl conf_param -P osp.lustre-OST0015-osc-MDT0000.active=0
>>
>> I get the error
>>
>> conf_param: invalid option -- 'P'
>> set a permanent config parameter.
>> This command must be run on the MGS node
>> usage: conf_param [-d] <target.keyword=val>
>> -d Remove the permanent setting.
>>
>> If I do
>>
>> lctl --device 28 deactivate
>>
>> I don't get an error, but nothing changes
>>
>> What can I do?
>>
>> Thank you in advance for any help.
>>
>> --
>> Alejandro Aguilar Sierra
>> LANOT, ICAyCC, UNAM
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
>
>
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.johnson at aeoncomputing.com
> http://www.aeoncomputing.com<http://www.aeoncomputing.com>
> t: 858-412-3810 x1001 f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230809/eb804456/attachment-0001.htm>
More information about the lustre-discuss
mailing list