[Lustre-discuss] OST crash recovery problem
Heiko Schroeter
schroete at iup.physik.uni-bremen.de
Tue Aug 19 00:01:57 PDT 2008
Hello,
Replying to myself. No we couldn't get lustre up again and had to reinstall
from scratch.
:-(
Keeping fingers crossed now we are running the productive system ....
What bugs us is this part of the message on the MDS:
Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The
configuration from log 'scia-OST0004' failed (-17). This may be the result of
communication errors between this node and the MGS, a bad configuration, or
other errors. See the syslog for more information.
Unfortunatly there are no more infos in the syslog.
Regards
Heiko
Hello again,
any idea what can be done in such a case ?
Regards
Heiko
Hello,
after a crash (hardware failure) of an OST with two lustre partitions one
partition (/dev/sdb) cannot be remounted after restart.
The second (/dev/sdc) partition mounts fine.
What needs to be done in such a case ?
I tried to move the mountpoint because of the "file exists" message but that
does not help.
Any pointers welcome.
Heiko
OST messages after mount command:
mount -t lustre /dev/sdb /mnt/data/ost3
<snip>
Aug 13 11:18:53 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev())
Device scia-OST0004 already exists, won't add
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type
obdfilter : -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command:
Aug 13 11:18:54 sadosrd20 Lustre: cmd=cf001 0:scia-OST0004 1:obdfilter
2:scia-OST0004_UUID
Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The
configuration from log 'scia-OST0004' failed (-17). This may be the result of
communication errors between this node and the MGS, a bad configuration, or
other errors. See the syslog for more information.
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1091:server_start_targets()) failed to start server
scia-OST0004: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1597:server_fill_super()) Unable to start targets: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1382:server_put_super()) no obd scia-OST0004
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits,
1 2^N hits, 0 breaks, 0 lost
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded
Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete
Aug 13 11:18:55 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1951:lustre_fill_super()) Unable to mount (-17)
<snap>
OST parameter:
mkfs.lustre --param="failover.mode=failout" --fsname
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b
4096' --mgsnode=mds1 at tcp0 /dev/sdb
mkfs.lustre --param="failover.mode=failout" --fsname
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b
4096' --mgsnode=mds1 at tcp0 /dev/sdc
MDS parameter:
mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0
Just for your info the OST output of the ok partition after mounting:
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD:
recovery, exit status 0, recovered transactions 72449 to 74105
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD:
Replayed 7548 and revoked 0/0 blocks
Aug 13 11:27:00 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete.
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:27:01 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup())
scia-OST0005: recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:
(filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF
Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev
(scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has
started
Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from
192.168.16.122 at tcp
Aug 13 11:27:06 sadosrd20 Lustre: 6414:0:
(filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan
objects from 3073 to 3180
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------------------------------------------------
--
-----------------------------------------------------------------------
Dipl.-Ing. Heiko Schröter
Institute of Environmental Physics (IUP) phone: ++49-(0)421-218-4080
Institute of Remote Sensing (IFE) fax: ++49-(0)421-218-4555
University of Bremen (FB1)
P.O. Box 330440 email: schroete at iup.physik.uni-bremen.de
Otto-Hahn-Allee 1
28359 Bremen
Germany
-----------------------------------------------------------------------
More information about the lustre-discuss
mailing list