[Lustre-discuss] Cannot mount MDS: Lustre: Denying initial registration attempt from nid 10.201.62.11 at o2ib, specified as failover

Adrian Ulrich adrian at blinkenlights.ch
Fri Nov 19 07:50:22 PST 2010


Hi,


Your MDS refuses to start after we tried to enable Quotas:


What we did:
 # umount /lustre/mds
 # tunefs.lustre --param mdt.quota_type=ug /dev/md10 (as described in http://wiki.lustre.org/manual/LustreManual18_HTML/ConfiguringQuotas.html)
 # sync
 # mount -t lustre /dev/md10 /lustre/mds
---> at this point, the mds crashed <---

Now the MDS refuses to startup:

Lustre: OBD class driver, http://www.lustre.org/
Lustre:     Lustre Version: 1.8.4
Lustre:     Build Version: 1.8.4-20100726215630-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
Lustre: Listener bound to ib0:10.201.62.11:987:mlx4_0
Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
Lustre: Added LNI 10.201.62.11 at o2ib [8/64/0/180]
Lustre: Added LNI 10.201.30.11 at tcp [8/256/0/180]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
init dynlocks cache
ldiskfs created from ext3-2.6-rhel5
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: recovery complete.
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
LDISKFS FS on md10, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: MGS MGS started
Lustre: MGC10.201.62.11 at o2ib: Reactivating import
Lustre: Denying initial registration attempt from nid 10.201.62.11 at o2ib, specified as failover
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect (no target)
LustreError: 6440:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19)  req at ffff81021986a000 x1352839800570911/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect (no target)
LustreError: Skipped 1 previous similar message
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19)  req at ffff81021986ac00 x1352839303546603/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181453 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6441:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 1 previous similar message
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect (no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8101ee758400 x1352840769468288/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181454 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6459:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 17 previous similar messages
LustreError: 6423:0:(mgs_handler.c:671:mgs_handle()) MGS handle cmd=253 rc=-99
LustreError: 11-0: an error occurred while communicating with 0 at lo. The mgs_target_reg operation failed with -99
LustreError: 6177:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for lustre1-MDT0000: -99
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect (no target)
LustreError: Skipped 17 previous similar messages
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8101ea921800 x1352839510145001/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181455 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6451:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 18 previous similar messages
LustreError: 6177:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -99
LustreError: 6177:0:(obd_mount.c:1438:server_put_super()) no obd lustre1-MDT0000
LustreError: 6177:0:(obd_mount.c:147:server_deregister_mount()) lustre1-MDT0000 not registered
Lustre: MGS has stopped.
LustreError: 137-5: UUID 'lustre1-MDT0000_UUID' is not available  for connect (no target)
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8101ec658000 x1352839459803293/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1290181457 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 6464:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 50 previous similar messages
LustreError: Skipped 58 previous similar messages
Lustre: server umount lustre1-MDT0000 complete
LustreError: 6177:0:(obd_mount.c:2050:lustre_fill_super()) Unable to
mount  (-99)


Removing the quota params via
 # tunefs.lustre --erase-params --param="failover.node=10.201.62.11 at o2ib,10.201.30.11 at tcp failover.node=10.201.62.12 at o2ib,10.201.30.12 at tcp mdt.group_upcall=/usr/sbin/l_getgroups" /dev/md10

did not help.


So what does 'Lustre: Denying initial registration attempt from nid 10.201.62.11 at o2ib, specified as failover' exactly mean?
This *is* 10.201.62.11 and tunefs shows:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre1-MDT0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x45
              (MDT MGS update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=10.201.62.11 at o2ib,10.201.30.11 at tcp failover.node=10.201.62.12 at o2ib,10.201.30.12 at tcp mdt.group_upcall=/usr/sbin/l_getgroups




Regards,
 Adrian



More information about the lustre-discuss mailing list