[Lustre-discuss] OST crash recovery problem

Heiko Schroeter schroete at iup.physik.uni-bremen.de
Tue Aug 19 00:01:57 PDT 2008


Hello,

Replying to myself. No we couldn't get lustre up again and had to reinstall 
from scratch.
:-(
Keeping fingers crossed now we are running the productive system ....

What bugs us is this part of the message on the MDS:

Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The 
configuration from log 'scia-OST0004' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.

Unfortunatly there are no more infos in the syslog.

Regards
Heiko







Hello again,

any idea what can be done in such a case ?

Regards
Heiko


Hello,

after a crash (hardware failure) of an OST with two lustre partitions one 
partition (/dev/sdb) cannot be remounted after restart.
The second (/dev/sdc) partition mounts fine.

What needs to be done in such a case ?
I tried to move the mountpoint because of the "file exists" message but that 
does not help.

Any pointers welcome.
Heiko


OST messages after mount command:
mount -t lustre /dev/sdb /mnt/data/ost3
<snip>
Aug 13 11:18:53 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) 
Device scia-OST0004 already exists, won't add
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type 
obdfilter : -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command:
Aug 13 11:18:54 sadosrd20 Lustre:    cmd=cf001 0:scia-OST0004  1:obdfilter  
2:scia-OST0004_UUID
Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The 
configuration from log 'scia-OST0004' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1091:server_start_targets()) failed to start server 
scia-OST0004: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1597:server_fill_super()) Unable to start targets: -17
Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1382:server_put_super()) no obd scia-OST0004
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 
1 2^N hits, 0 breaks, 0 lost
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512
Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded
Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete
Aug 13 11:18:55 sadosrd20 LustreError: 7247:0:
(obd_mount.c:1951:lustre_fill_super()) Unable to mount  (-17)
<snap>

OST parameter:
mkfs.lustre --param="failover.mode=failout" --fsname 
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b 
4096' --mgsnode=mds1 at tcp0 /dev/sdb
mkfs.lustre --param="failover.mode=failout" --fsname 
scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b 
4096' --mgsnode=mds1 at tcp0 /dev/sdc

MDS parameter:
mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0


Just for your info the OST output of the ok partition after mounting:
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD: 
recovery, exit status 0, recovered transactions 72449 to 74105
Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD: 
Replayed 7548 and revoked 0/0 blocks
Aug 13 11:27:00 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete.
Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:27:01 sadosrd20 kjournald starting.  Commit interval 5 seconds
Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data 
mode.
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled
Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup()) 
scia-OST0005: recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:
(filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF
Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev 
(scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled
Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has 
started
Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from 
192.168.16.122 at tcp
Aug 13 11:27:06 sadosrd20 Lustre: 6414:0:
(filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan 
objects from 3073 to 3180
 

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

-------------------------------------------------------

-- 
-----------------------------------------------------------------------
Dipl.-Ing. Heiko Schröter
Institute of Environmental Physics (IUP)    phone: ++49-(0)421-218-4080
Institute of Remote Sensing (IFE)           fax:   ++49-(0)421-218-4555
University of Bremen (FB1)
P.O. Box 330440               email:  schroete at iup.physik.uni-bremen.de
Otto-Hahn-Allee 1           
28359 Bremen                
Germany
-----------------------------------------------------------------------



More information about the lustre-discuss mailing list