[Lustre-discuss] OST crash recovery problem
Heiko Schroeter
schroete at iup.physik.uni-bremen.de
Wed Aug 13 23:40:05 PDT 2008
Hello again,
any idea what can be done in such a case ?
Regards
Heiko
Hello,
after a crash (hardware failure) of an OST with two lustre partitions one
partition (/dev/sdb) cannot be remounted after restart.
The second (/dev/sdc) partition mounts fine.
What needs to be done in such a case ?
I tried to move the mountpoint because of the "file exists" message but that
does not help.
Any pointers welcome.
Heiko
OST messages after mount command:
mount -t lustre /dev/sdb /mnt/data/ost3
<snip>
Aug 13 11:18:53 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev())
Device scia-OST0004 already exists, won't add
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type
obdfilter : -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command:
Aug 13 11:18:54 sadosrd20 Lustre: cmd=cf001 0:scia-OST0004 1:obdfilter
2:scia-OST0004_UUID
Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The
configuration from log 'scia-OST0004' failed (-17). This may be the result of
communication errors between this node and the MGS, a bad configuration, or
other errors. See the syslog for more information.
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1091:server_start_targets()) failed to start server
scia-OST0004: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1597:server_fill_super()) Unable to start targets: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1382:server_put_super()) no obd scia-OST0004
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits,
1 2^N hits, 0 breaks, 0 lost
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded
Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete
Aug 13 11:18:55 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1951:lustre_fill_super()) Unable to mount (-17)
<snap>
OST parameter:
mkfs.lustre --param="failover.mode=failout" --fsname
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b
4096' --mgsnode=mds1 at tcp0 /dev/sdb
mkfs.lustre --param="failover.mode=failout" --fsname
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b
4096' --mgsnode=mds1 at tcp0 /dev/sdc
MDS parameter:
mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0
Just for your info the OST output of the ok partition after mounting:
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD:
recovery, exit status 0, recovered transactions 72449 to 74105
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD:
Replayed 7548 and revoked 0/0 blocks
Aug 13 11:27:00 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete.
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:27:01 sadosrd20 kjournald starting. Commit interval 5 seconds
Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data
mode.
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup())
scia-OST0005: recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:
(filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF
Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev
(scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has
started
Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from
192.168.16.122 at tcp
Aug 13 11:27:06 sadosrd20 Lustre: 6414:0:
(filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan
objects from 3073 to 3180
More information about the lustre-discuss
mailing list