[Lustre-discuss] OST crash with group descriptors corrupted

thhsieh thhsieh at piano.rcas.sinica.edu.tw
Mon Mar 9 04:39:02 PDT 2009


Dear All,

We have an emergent condition on the Lustre filesystem.

We installed the lustre-1.6.6 with Linux kernel 2.6.22.19 on all the
MGS, MDT, OST servers and clients. They runs very well. But today
we encounter the disk array hardware problem (one of the hard disk
of the disk array RAID 6 crashed), and soon after that the lustre
filesystem got crashed, too.

After we replacing the bad hard disk with a new one, the disk array
seems rebuilding the RAID 6 data on the hard disk correctly. The file
servers seem can access the partitions of that disk array correctly.
But the OST partition on that disk array cannot be accessible now:

root at wd2:~# mount -t ldiskfs /dev/sdb1 /mnt/mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

The dmesg message shows:

[ 3314.530762] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors: Block bitmap for group 11152 not in group (block 3407085568)!
[ 3314.531701] LDISKFS-fs: group descriptors corrupted!

If I run: ./tunefs.lustre --writeconf /dev/sdb1

Reading CONFIGS/mountdata

   Read previous values:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp


   Permanent disk data:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp

tunefs.lustre: Unable to mount /dev/sdb1: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)


The result of the command: "dumpe2fs /dev/sdb1"  gives:

Filesystem volume name:   cwork2-OST0000
Last mounted on:          <not available>
Filesystem UUID:          4f4323df-73a5-4e93-9a2d-2c2b9a6c3c60
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype n
eeds_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              101335040
Block count:              405336007
Reserved block count:     20266800
Free blocks:              164142148
Free inodes:              119852810
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      927
Blocks per group:         32768
Fragments per group:      32768
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype n
eeds_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              101335040
Block count:              405336007
Reserved block count:     20266800
Free blocks:              164142148
Free inodes:              119852810
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      927
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Thu Oct 16 15:29:21 2008

.....

  Block bitmap at 2820539742 (+2415232350), Inode bitmap at 2820539691 (+2415232299)
  Inode table at 2820539857-2820540368 (+2415232465)
  40232 free blocks, 20387 free inodes, 0 directories

dumpe2fs: /dev/sdb1: error reading bitmaps: Can't read an block bitmap


It seems that the backend ext3 file system is still there, but has
errors.

Could anyone suggest me a way to recover the OST partitions? Can I use
e2fsck to fix the problems of the OST partitions?

The MGS and MDT seem to be ok, because they are not in the disk array.


Thanks very much for your kindly help.


Best Regards,

T.H.Hsieh



More information about the lustre-discuss mailing list