[Lustre-discuss] After Upgrade 1.6.7.2->1.8.8, OSTs won't mount

Roger Sersted rs1 at aps.anl.gov
Thu Sep 27 15:17:50 PDT 2012


I upgraded from Lustre 1.6.7.2 to 1.8.8 by swapping-out all 6 OSSes and 
replacing with new hardware.  The MDT was moved to another system using the 
backup/restore procedure in the Lustre 1.8 manual (tar with setfattr and getfattr).

	Old:
	CentOS 5.5 x86_64
	Lustre 1.6.7.2 (from Sun)

	New:
	CentOS 5.8 x86_64
	Lustre 1.8.8 (from Whamcloud)

The MDS mounts the MDT (combined MGT) just fine.  However, the OSSes are having 
problems.  I ran e2fsck on two different OSSes (each with a different OST) and 
one e2fsck corrected a few errors, the other was clean.  But, neither one can 
mount their respective OST.

The errors indicated a problem with the FS journal (internal journal).  I tried 
to drop it with tune2fs -O ^has_journal /dev/sdp, but it would run forever.  On 
a normal FS it should take just a few seconds.  An strace showed it was 
continually seeking through the FS, eg

lseek(3, 61471719424, SEEK_SET)         = 61471719424
read(3,"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 
4096) = 4096

I've rebooted the machine and made sure all the lustre drivers were loaded and 
they were.

I also downgraded e2fsprogs from e2fsprogs-1.42.3.wc3 to 1.41.90.wc3, thinking 
I had hit an obscure code bug, no change.

After re-reading some of the messages, do I need to convert these ext3/ldiskfs 
filesystems to ext4, eg  tune2fs -O extents,uninit_bg,dir_index /dev/sdp?

Thanks,

Roger S.

Here are some command results:

================================================
[root at apslstr07 ~]# tunefs.lustre --verbose   --writeconf /dev/sdp
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

    Read previous values:
Target:     lustre1-OST0003
Index:      3
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x2
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp 
failover.node=172.16.1.108 at tcp


    Permanent disk data:
Target:     lustre1-OST0003
Index:      3
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x102
               (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp 
failover.node=172.16.1.108 at tcp

tunefs.lustre: Unable to mount /dev/sdp: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)

=============================================================
[root at apslstr07 ~]# mount -v -t ldiskfs /dev/sdp /lustre
mount: wrong fs type, bad option, bad superblock on /dev/sdp,
        missing codepage or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

from /var/log/messages:

Sep 27 17:00:10 apslstr07 kernel: LDISKFS-fs (sdp): no journal found

=============================================================
[root at apslstr07 log]# mount -v -t lustre /dev/sdp /lustre
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = /dev/sdp
arg[5] = /lustre
source = /dev/sdp (/dev/sdp), target = /lustre
options = rw
mounting device /dev/sdp at /lustre, flags=0 options=device=/dev/sdp
mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument retries left: 0
mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

from /var/log/messages:

Sep 27 17:01:13 apslstr07 kernel: Lustre: Build Version: 
jenkins-wc1--PRISTINE-2.6.18-308.4.1.el5_lustre
Sep 27 17:01:13 apslstr07 kernel: Lustre: Listener bound to 
ib0:172.17.1.107:987:mlx4_0
Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.17.1.107 at o2ib [8/64/0/180]
Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.16.1.107 at tcp [8/256/0/180]
Sep 27 17:01:13 apslstr07 kernel: Lustre: Accept secure, port 988
Sep 27 17:01:14 apslstr07 kernel: Lustre: Lustre Client File System; 
http://www.lustre.org/
Sep 27 17:01:14 apslstr07 kernel: LDISKFS-fs (sdp): no journal found
Sep 27 17:01:14 apslstr07 kernel: LustreError: 
14854:0:(obd_mount.c:1307:server_kernel_mount()) premount /dev/sdp:0x0 ldiskfs 
failed: -22, ldiskfs2 failed: -19.  Is the ldiskfs module available?
Sep 27 17:01:14 apslstr07 kernel: LustreError: 
14854:0:(obd_mount.c:1633:server_fill_super()) Unable to mount device /dev/sdp: -22
Sep 27 17:01:14 apslstr07 kernel: LustreError: 
14854:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount  (-22)




More information about the lustre-discuss mailing list