[Lustre-discuss] OST crash recovery problem

Heiko Schroeter schroete at iup.physik.uni-bremen.de
Wed Aug 13 02:28:42 PDT 2008


Hello,

after a crash (hardware failure) of an OST with two lustre partitions one 
partition (/dev/sdb) cannot be remounted after restart.
The second (/dev/sdc) partition mounts fine.

What needs to be done in such a case ?
I tried to move the mountpoint because of the "file exists" message but that 
does not help.

Any pointers welcome.
Heiko


OST messages after mount command:
mount -t lustre /dev/sdb /mnt/data/ost3
<snip>
Aug 13 11:18:53 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) 
Device scia-OST0004 already exists, won't add
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type 
obdfilter : -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command:
Aug 13 11:18:54 sadosrd20 Lustre:    cmd=cf001 0:scia-OST0004  1:obdfilter  
2:scia-OST0004_UUID
Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The 
configuration from log 'scia-OST0004' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1091:server_start_targets()) failed to start server 
scia-OST0004: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1597:server_fill_super()) Unable to start targets: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1382:server_put_super()) no obd scia-OST0004
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 
1 2^N hits, 0 breaks, 0 lost
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded
Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete
Aug 13 11:18:55 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1951:lustre_fill_super()) Unable to mount  (-17)
<snap>

OST parameter:
mkfs.lustre --param="failover.mode=failout" --fsname 
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b 
4096' --mgsnode=mds1 at tcp0 /dev/sdb
mkfs.lustre --param="failover.mode=failout" --fsname 
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b 
4096' --mgsnode=mds1 at tcp0 /dev/sdc

MDS parameter:
mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0


Just for your info the OST output of the ok partition after mounting:
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD: 
recovery, exit status 0, recovered transactions 72449 to 74105
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD: 
Replayed 7548 and revoked 0/0 blocks
Aug 13 11:27:00 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete.
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:27:01 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup()) 
scia-OST0005: recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:
(filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF
Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev 
(scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has 
started
Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from 
192.168.16.122 at tcp
Aug 13 11:27:06 sadosrd20 Lustre: 6414:0:
(filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan 
objects from 3073 to 3180
 




More information about the lustre-discuss mailing list