[Lustre-discuss] MDT crash : group descriptors corrupted!

Norimichi SUZUKI norimichi.suzuki at hpc-technologies.co.jp
Mon May 2 03:44:57 PDT 2011


Hi,

Because of network trouble, our mds was crashed.
After that I can't mount mdt(/dev/mapper/mpath1p1).

[root at mds1 ~]# mount -t lustre /dev/mapper/mpath1p1 /mds
mount.lustre: mount /dev/mapper/mpath1p1 at /mds failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

syslog is following.

May 2 19:16:39 mds1 kernel: LDISKFS-fs (dm-1):
ldiskfs_check_descriptors: Checksum for group 0 failed (20132!=16032)
May 2 19:16:39 mds1 kernel: LDISKFS-fs (dm-1): group descriptors corrupted!
May 2 19:16:39 mds1 multipathd: dm-1: umount map (uevent)
May 2 19:16:39 mds1 kernel: LustreError:
12513:0:(obd_mount.c:1292:server_kernel_mount()) premount
/dev/mapper/mpath1p1:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19. Is
the ldiskfs module available?
May 2 19:16:39 mds1 kernel: LustreError:
12513:0:(obd_mount.c:1618:server_fill_super()) Unable to mount device
/dev/mapper/mpath1p1: -22
May 2 19:16:39 mds1 kernel: LustreError:
12513:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-22)

I'v seen other similar cases in this list and so I tried e2fsck.

[root at mds1 log]# e2fsck -fp /dev/mapper/mpath1p1
e2fsck: MMP: fsck being run while trying to open /dev/mapper/mpath1p1
lustre-MDT0000:
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 32768 <device>

[root at mds1 log]# e2fsck -b 32768 /dev/mapper/mpath1p1
e2fsck 1.41.12.2.ora3 (23-Feb-2011)
e2fsck: Bad magic number in super-block while trying to open
/dev/mapper/mpath1p1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 4294967294 <device>

But when I execute e2fsck with -n option, I got a lot of messages like this.

Group descriptor 44662 checksum is invalid. IGNORED.
Group descriptor 44663 checksum is invalid. IGNORED.
Group descriptor 44664 checksum is invalid. IGNORED.
・
・

Our environment:
OS : CentOS v5.5 x64
[root at mds1 log]# rpm -qa | grep lustre
kernel-debuginfo-common-2.6.18-194.3.1.el5_lustre.1.8.4
lustre-ldiskfs-3.1.3-2.6.18_194.3.1.el5_lustre.1.8.4
lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4
kernel-2.6.18-194.3.1.el5_lustre.1.8.4
lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
kernel-debuginfo-2.6.18-194.3.1.el5_lustre.1.8.4

And I updated e2fsprog.

e2fsprogs-1.41.12.2.ora3-0redhat.x86_64.rpm
e2fsprogs-devel-1.41.12.2.ora3-0redhat.x86_64.rpm

[root at mds1 ~]# debugfs /dev/mapper/mpath1p1
debugfs 1.41.12.2.ora3 (23-Feb-2011)
debugfs: ls
2 (12) . 2 (12) .. 11 (20) lost+found 238583809 (16) CONFIGS
415629313 (12) ROOT 1006206977 (16) PENDING 1148682241 (12) LOGS
763035649 (16) OBJECTS 12 (20) last_rcvd 13 (20) lov_objid
14 (20) health_check 15 (3920) CATALOGS
debugfs: stats
Filesystem volume name: lustre-MDT0000
Last mounted on: /
Filesystem UUID: 6d67c826-440d-41c2-8548-ed510c008db4
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype mmp sparse_super large_file uninit_bg
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1463844864
Block count: 1463838711
Reserved block count: 73191935
Free blocks: 1280643105
Free inodes: 1463843707
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 674
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 32768
Inode blocks per group: 4096
Filesystem created: Wed Oct 27 20:35:54 2010
Last mount time: Fri Oct 29 18:36:44 2010
Last write time: Mon May 2 19:23:20 2011
Mount count: 6
Maximum mount count: 26
Last checked: Wed Oct 27 20:35:54 2010
Check interval: 15552000 (6 months)
Next check after: Mon Apr 25 20:35:54 2011
Lifetime writes: 698 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 512
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: a6d9f773-352f-415c-8f30-3b63f3d4d2f7
Journal backup: inode blocks
MMP block number: 5129
MMP update interval: 1
Directories: 8
Group 0: block bitmap at 1025, inode bitmap at 1026, inode table at 1027
27639 free blocks, 32753 free inodes, 2 used directories, 0 unused inodes
[Checksum 0x3ea0]
Group 1: block bitmap at 33793, inode bitmap at 33794, inode table at 33795
27645 free blocks, 32768 free inodes, 0 used directories, 0 unused inodes
[Checksum 0x8e73]
Group 2: block bitmap at 65536, inode bitmap at 65537, inode table at 65538
28670 free blocks, 32768 free inodes, 0 used directories, 0 unused inodes
[Checksum 0xc7a1]
・
・

Can anyone please give me advice?

Thanks in advance.

Norimichi Suzuki








More information about the lustre-discuss mailing list