[lustre-discuss] Unable to mount lustre filesystem after failed OST addition

BALVERS Martin Martin.BALVERS at danone.com
Wed Aug 27 03:01:16 PDT 2025


Hi Lustre community,

I think i messed up my lustre config while setting up my new filesystem. The server version is 2.15.1
While adding OST's to my new lustre filesystem, I ran into an issue when adding OST0005. After formatting the zpool, I could not mount the OST, it failed with the message below.

[root at oss6 ~]# mount -t lustre -v lustre/ost1 /mnt/ost1
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = lustre/ost1
arg[5] = /mnt/ost1
source = lustre/ost1 (lustre/ost1), target = /mnt/ost1
options = rw
checking for existing Lustre data: found
Writing lustre/ost1 properties
  lustre:mgsnode=192.168.6.1 at tcp
  lustre:version=1
  lustre:flags=2
  lustre:index=5
  lustre:fsname=lustre
  lustre:svname=lustre-OST0005
mounting device lustre/ost1 at /mnt/ost1, flags=0x1000000 options=osd=osd-zfs,mgsnode=192.168.6.1 at tcp,update,param=mgsnode=192.168.6.1 at tcp,svname=lustre-OST0005,device=lustre/ost1
mount.lustre: mount -t lustre lustre/ost1 at /mnt/ost1 failed: Input/output error retries left: 0
mount.lustre: mount lustre/ost1 at /mnt/ost1 failed: Input/output error
Is the MGS running?

I could not add this particular server. I added OST0006 without any problem.
At this point I reconnected the clients and started using the lustre fs again.

After reinstalling the OS on the server, replacing the network card and cable, I still was unable to add the OST.
It turned out to be a misconfiguration on the switch, jumbo frames were not enabled on the port used for that server.
After fixing that, I still could not add this server as OST0005 because i got the message that OST0005 already existed.
Reformatting with the --replace option with index 5 did not work. I was able to add this server with a new index, so now it is added with index 7 as OST0007.

At this point I thought everything was fine. All the clients could see all OST's and all servers were being used.

When I needed to reboot a client, it was no longer able to mount the lustre filesystem. I get the following error.

# mount -v -t lustre 192.168.6.1 at tcp:/lustre /lustre
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = 192.168.6.1 at tcp:/lustre
arg[5] = /lustre
source = 192.168.6.1 at tcp:/lustre (192.168.6.1 at tcp:/lustre), target = /lustre
options(2/4096) = rw
mounting device 192.168.6.1 at tcp:/lustre at /lustre, flags=0x1000000 options=device=192.168.6.1 at tcp:/lustre
mount.lustre: mount -t lustre 192.168.6.1 at tcp:/lustre at /lustre failed: Invalid argument retries left: 0
mount.lustre: mount 192.168.6.1 at tcp:/lustre at /lustre failed: Invalid argument
This may have multiple causes.
Is 'lustre' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info.

The logs show:
Aug 27 11:36:09 trinityx kernel: LustreError: 110360:0:(obd_config.c:1557:class_process_config()) no device for: lustre-OST0005-osc-ffff921d73617800
Aug 27 11:36:09 trinityx kernel: LustreError: 110360:0:(obd_config.c:2029:class_config_llog_handler()) MGC192.168.6.1 at tcp: cfg command failed: rc = -22
Aug 27 11:36:09 trinityx kernel: Lustre:    cmd=cf00f 0:lustre-OST0005-osc  1:osc.active=0
Aug 27 11:36:09 trinityx kernel: LustreError: MGC192.168.6.1 at tcp: Configuration from log lustre-client failed from MGS -22. Check client and MGS are on compatible version.
Aug 27 11:36:09 trinityx kernel: Lustre: Unmounted lustre-client
Aug 27 11:36:09 trinityx kernel: LustreError: 110343:0:(super25.c:188:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -22

I tried disabling OST0005 on the MDS, but that also gave an error in the logs.

# lctl conf_param lustre-OST0005.osc.active=0

[Aug26 17:07] Lustre: Permanently deactivating lustre-OST0005
[  +0.001950] Lustre: Modifying parameter lustre-OST0005-osc.osc.active in log lustre-client
[  +0.001193] Lustre: Skipped 1 previous similar message
[  +7.429669] LustreError: 4158100:0:(obd_config.c:1499:class_process_config()) no device for: lustre-OST0005-osc-MDT0000
[  +0.001253] LustreError: 4158100:0:(obd_config.c:2001:class_config_llog_handler()) MGC192.168.6.1 at tcp: cfg command failed: rc = -22
[  +0.000829] Lustre:    cmd=cf00f 0:lustre-OST0005-osc-MDT0000  1:osc.active=0
[  +0.001275] LustreError: 60246:0:(mgc_request.c:612:do_requeue()) failed processing log: -22

I am now in the situation that all currently connected clients can use the filesystem, but as soon as I need to reboot them, they cannot reconnect.

Is there a way to fix this, preferably without taking everything offline?

Thanks in advance,
Martin Balvers

Aditional info.
[root at mds ~]# lctl dl
  0 UP osd-zfs MGS-osd MGS-osd_UUID 4
  1 UP mgs MGS MGS 64
  2 UP mgc MGC192.168.6.1 at tcp ab5db231-f023-4acb-8896-0cfb93c5ed25 4
  3 UP osd-zfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 13
  4 UP mds MDS MDS_uuid 2
  5 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 3
  6 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 94
  7 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 3
  8 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 3
  9 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 10 UP osp lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 11 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 12 UP osp lustre-OST0003-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 13 UP osp lustre-OST0004-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 14 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 4
 15 UP osp lustre-OST0006-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 16 UP osp lustre-OST0007-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
[root at mds ~]# lctl --device MGS llog_print lustre-client
- { index: 2, event: attach, device: lustre-clilov, type: lov, UUID: lustre-clilov_UUID }
- { index: 3, event: setup, device: lustre-clilov, UUID:  }
- { index: 6, event: attach, device: lustre-clilmv, type: lmv, UUID: lustre-clilmv_UUID }
- { index: 7, event: setup, device: lustre-clilmv, UUID:  }
- { index: 10, event: add_uuid, nid: 192.168.6.1 at tcp(0x20000c0a80601), node: 192.168.6.1 at tcp }
- { index: 11, event: attach, device: lustre-MDT0000-mdc, type: mdc, UUID: lustre-clilmv_UUID }
- { index: 12, event: setup, device: lustre-MDT0000-mdc, UUID: lustre-MDT0000_UUID, node: 192.168.6.1 at tcp }
- { index: 13, event: add_mdc, device: lustre-clilmv, mdt: lustre-MDT0000_UUID, index: 0, gen: 1, UUID: lustre-MDT0000-mdc_UUID }
- { index: 16, event: new_profile, name: lustre-client, lov: lustre-clilov, lmv: lustre-clilmv }
- { index: 19, event: add_uuid, nid: 192.168.6.2 at tcp(0x20000c0a80602), node: 192.168.6.2 at tcp }
- { index: 20, event: attach, device: lustre-OST0000-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 21, event: setup, device: lustre-OST0000-osc, UUID: lustre-OST0000_UUID, node: 192.168.6.2 at tcp }
- { index: 22, event: add_osc, device: lustre-clilov, ost: lustre-OST0000_UUID, index: 0, gen: 1 }
- { index: 25, event: add_uuid, nid: 192.168.6.4 at tcp(0x20000c0a80604), node: 192.168.6.4 at tcp }
- { index: 26, event: attach, device: lustre-OST0002-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 27, event: setup, device: lustre-OST0002-osc, UUID: lustre-OST0002_UUID, node: 192.168.6.4 at tcp }
- { index: 28, event: add_osc, device: lustre-clilov, ost: lustre-OST0002_UUID, index: 2, gen: 1 }
- { index: 31, event: add_uuid, nid: 192.168.6.3 at tcp(0x20000c0a80603), node: 192.168.6.3 at tcp }
- { index: 32, event: attach, device: lustre-OST0001-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 33, event: setup, device: lustre-OST0001-osc, UUID: lustre-OST0001_UUID, node: 192.168.6.3 at tcp }
- { index: 34, event: add_osc, device: lustre-clilov, ost: lustre-OST0001_UUID, index: 1, gen: 1 }
- { index: 37, event: add_uuid, nid: 192.168.6.5 at tcp(0x20000c0a80605), node: 192.168.6.5 at tcp }
- { index: 38, event: attach, device: lustre-OST0003-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 39, event: setup, device: lustre-OST0003-osc, UUID: lustre-OST0003_UUID, node: 192.168.6.5 at tcp }
- { index: 40, event: add_osc, device: lustre-clilov, ost: lustre-OST0003_UUID, index: 3, gen: 1 }
- { index: 46, event: add_uuid, nid: 192.168.6.6 at tcp(0x20000c0a80606), node: 192.168.6.6 at tcp }
- { index: 47, event: attach, device: lustre-OST0004-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 48, event: setup, device: lustre-OST0004-osc, UUID: lustre-OST0004_UUID, node: 192.168.6.6 at tcp }
- { index: 49, event: add_osc, device: lustre-clilov, ost: lustre-OST0004_UUID, index: 4, gen: 1 }
- { index: 53, event: conf_param, device: lustre-OST0003-osc, parameter: osc.active=1 }
- { index: 56, event: add_uuid, nid: 192.168.6.8 at tcp(0x20000c0a80608), node: 192.168.6.8 at tcp }
- { index: 57, event: attach, device: lustre-OST0006-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 58, event: setup, device: lustre-OST0006-osc, UUID: lustre-OST0006_UUID, node: 192.168.6.8 at tcp }
- { index: 59, event: add_osc, device: lustre-clilov, ost: lustre-OST0006_UUID, index: 6, gen: 1 }
- { index: 68, event: add_uuid, nid: 192.168.6.7 at tcp(0x20000c0a80607), node: 192.168.6.7 at tcp }
- { index: 69, event: attach, device: lustre-OST0007-osc, type: osc, UUID: lustre-clilov_UUID }
- { index: 70, event: setup, device: lustre-OST0007-osc, UUID: lustre-OST0007_UUID, node: 192.168.6.7 at tcp }
- { index: 71, event: add_osc, device: lustre-clilov, ost: lustre-OST0007_UUID, index: 7, gen: 1 }
- { index: 74, event: conf_param, device: lustre-OST0005-osc, parameter: osc.active=0 }

Ce message électronique et tous les fichiers attachés qu'il contient sont confidentiels et destinés exclusivement à l'usage de la personne à laquelle ils sont adressés. Si vous avez reçu ce message par erreur, merci de le retourner à son émetteur. Les idées et opinions présentées dans ce message sont celles de son auteur, et ne représentent pas nécessairement celles de DANONE ou d'une quelconque de ses filiales. La publication, l'usage, la distribution, l'impression ou la copie non autorisée de ce message et des attachements qu'il contient sont strictement interdits. 

This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual to whom it is addressed. If you have received this email in error please send it back to the person that sent it to you. Any views or opinions presented are solely those of its author and do not necessarily represent those of DANONE or any of its subsidiary companies. Unauthorized publication, use, dissemination, forwarding, printing or copying of this email and its associated attachments is strictly prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250827/9d42d29d/attachment-0001.htm>


More information about the lustre-discuss mailing list