[Lustre-discuss] After Upgrade 1.6.7.2->1.8.8, OSTs won't mount
Roger Sersted
rs1 at aps.anl.gov
Thu Sep 27 15:17:50 PDT 2012
I upgraded from Lustre 1.6.7.2 to 1.8.8 by swapping-out all 6 OSSes and
replacing with new hardware. The MDT was moved to another system using the
backup/restore procedure in the Lustre 1.8 manual (tar with setfattr and getfattr).
Old:
CentOS 5.5 x86_64
Lustre 1.6.7.2 (from Sun)
New:
CentOS 5.8 x86_64
Lustre 1.8.8 (from Whamcloud)
The MDS mounts the MDT (combined MGT) just fine. However, the OSSes are having
problems. I ran e2fsck on two different OSSes (each with a different OST) and
one e2fsck corrected a few errors, the other was clean. But, neither one can
mount their respective OST.
The errors indicated a problem with the FS journal (internal journal). I tried
to drop it with tune2fs -O ^has_journal /dev/sdp, but it would run forever. On
a normal FS it should take just a few seconds. An strace showed it was
continually seeking through the FS, eg
lseek(3, 61471719424, SEEK_SET) = 61471719424
read(3,"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"...,
4096) = 4096
I've rebooted the machine and made sure all the lustre drivers were loaded and
they were.
I also downgraded e2fsprogs from e2fsprogs-1.42.3.wc3 to 1.41.90.wc3, thinking
I had hit an obscure code bug, no change.
After re-reading some of the messages, do I need to convert these ext3/ldiskfs
filesystems to ext4, eg tune2fs -O extents,uninit_bg,dir_index /dev/sdp?
Thanks,
Roger S.
Here are some command results:
================================================
[root at apslstr07 ~]# tunefs.lustre --verbose --writeconf /dev/sdp
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: lustre1-OST0003
Index: 3
Lustre FS: lustre1
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp
failover.node=172.16.1.108 at tcp
Permanent disk data:
Target: lustre1-OST0003
Index: 3
Lustre FS: lustre1
Mount type: ldiskfs
Flags: 0x102
(OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.16.1.110 at tcp mgsnode=172.16.1.111 at tcp
failover.node=172.16.1.108 at tcp
tunefs.lustre: Unable to mount /dev/sdp: Invalid argument
tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)
=============================================================
[root at apslstr07 ~]# mount -v -t ldiskfs /dev/sdp /lustre
mount: wrong fs type, bad option, bad superblock on /dev/sdp,
missing codepage or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
from /var/log/messages:
Sep 27 17:00:10 apslstr07 kernel: LDISKFS-fs (sdp): no journal found
=============================================================
[root at apslstr07 log]# mount -v -t lustre /dev/sdp /lustre
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = /dev/sdp
arg[5] = /lustre
source = /dev/sdp (/dev/sdp), target = /lustre
options = rw
mounting device /dev/sdp at /lustre, flags=0 options=device=/dev/sdp
mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument retries left: 0
mount.lustre: mount /dev/sdp at /lustre failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
from /var/log/messages:
Sep 27 17:01:13 apslstr07 kernel: Lustre: Build Version:
jenkins-wc1--PRISTINE-2.6.18-308.4.1.el5_lustre
Sep 27 17:01:13 apslstr07 kernel: Lustre: Listener bound to
ib0:172.17.1.107:987:mlx4_0
Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.17.1.107 at o2ib [8/64/0/180]
Sep 27 17:01:13 apslstr07 kernel: Lustre: Added LNI 172.16.1.107 at tcp [8/256/0/180]
Sep 27 17:01:13 apslstr07 kernel: Lustre: Accept secure, port 988
Sep 27 17:01:14 apslstr07 kernel: Lustre: Lustre Client File System;
http://www.lustre.org/
Sep 27 17:01:14 apslstr07 kernel: LDISKFS-fs (sdp): no journal found
Sep 27 17:01:14 apslstr07 kernel: LustreError:
14854:0:(obd_mount.c:1307:server_kernel_mount()) premount /dev/sdp:0x0 ldiskfs
failed: -22, ldiskfs2 failed: -19. Is the ldiskfs module available?
Sep 27 17:01:14 apslstr07 kernel: LustreError:
14854:0:(obd_mount.c:1633:server_fill_super()) Unable to mount device /dev/sdp: -22
Sep 27 17:01:14 apslstr07 kernel: LustreError:
14854:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-22)
More information about the lustre-discuss
mailing list