<div dir="ltr">Alejandro,<div><br></div><div>Is your MGS located on the same node as your primary MDT? (combined MGS/MDT node)</div><div><br></div><div>--Jeff</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss <<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>
<br>
In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS<br>
with 4 jboxes, each box with 24 disks of 12 TB each, for a total of<br>
nearly 1 PB. In all that time we had power failures and failed raid<br>
controller cards, all of which made us adjust the configuration. After<br>
the last failure, the system keeps sending error messages about OSTs<br>
that are no more in the system. In the MDS I do<br>
<br>
# lctl dl<br>
<br>
and I get the 20 currently active OSTs<br>
<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST00   /dev/disk/by-label/lustre-OST0000<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST01   /dev/disk/by-label/lustre-OST0001<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST02   /dev/disk/by-label/lustre-OST0002<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST03   /dev/disk/by-label/lustre-OST0003<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST04   /dev/disk/by-label/lustre-OST0004<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST05   /dev/disk/by-label/lustre-OST0005<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST06   /dev/disk/by-label/lustre-OST0006<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST07   /dev/disk/by-label/lustre-OST0007<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST08   /dev/disk/by-label/lustre-OST0008<br>
<a href="http://oss01.lanot.unam.mx" rel="noreferrer" target="_blank">oss01.lanot.unam.mx</a>     -       OST09   /dev/disk/by-label/lustre-OST0009<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST15   /dev/disk/by-label/lustre-OST000f<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST16   /dev/disk/by-label/lustre-OST0010<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST17   /dev/disk/by-label/lustre-OST0011<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST18   /dev/disk/by-label/lustre-OST0012<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST19   /dev/disk/by-label/lustre-OST0013<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST25   /dev/disk/by-label/lustre-OST0019<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST26   /dev/disk/by-label/lustre-OST001a<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST27   /dev/disk/by-label/lustre-OST001b<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST28   /dev/disk/by-label/lustre-OST001c<br>
<a href="http://oss02.lanot.unam.mx" rel="noreferrer" target="_blank">oss02.lanot.unam.mx</a>     -       OST29   /dev/disk/by-label/lustre-OST001d<br>
<br>
but I also get 5 that are not currently active, in fact doesn't exist<br>
<br>
 28 IN osp lustre-OST0014-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4<br>
 29 UP osp lustre-OST0015-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4<br>
 30 UP osp lustre-OST0016-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4<br>
 31 UP osp lustre-OST0017-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4<br>
 32 UP osp lustre-OST0018-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4<br>
<br>
When I try to eliminate them with<br>
<br>
lctl conf_param -P osp.lustre-OST0015-osc-MDT0000.active=0<br>
<br>
I get the error<br>
<br>
conf_param: invalid option -- 'P'<br>
set a permanent config parameter.<br>
This command must be run on the MGS node<br>
usage: conf_param [-d] <target.keyword=val><br>
  -d  Remove the permanent setting.<br>
<br>
If I do<br>
<br>
lctl --device 28 deactivate<br>
<br>
I don't get an error, but nothing changes<br>
<br>
What can I do?<br>
<br>
Thank you in advance for any help.<br>
<br>
--<br>
Alejandro Aguilar Sierra<br>
LANOT, ICAyCC, UNAM<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr">------------------------------<br>Jeff Johnson<br>Co-Founder<br>Aeon Computing<br><br><a href="mailto:jeff.johnson@aeoncomputing.com" target="_blank">jeff.johnson@aeoncomputing.com</a><br><a href="http://www.aeoncomputing.com" target="_blank">www.aeoncomputing.com</a><br>t: 858-412-3810 x1001   f: 858-412-3845<br>m: 619-204-9061<br><br>4170 Morena Boulevard, Suite C - San Diego, CA 92117<div><br></div><div>High-Performance Computing / Lustre Filesystems / Scale-out Storage</div></div></div></div></div>