[Lustre-discuss] Remounting OSTs on other servers

Tue Jun 2 01:24:09 PDT 2009

Lustre Admins,

We are currently in the process of upgrading our Redhat-based 1.6.7.1 lustre
setup.  Previously we had one lustre server which acted as the MGS, MDS and
OSS for a number of unpatched redhat lustre clients.  We have 3 distinct
lustre filesystems  each with a MDT partition/LUN and a number of OST
partitions/LUNs  all accessed over a fibre-channel SAN.

We now have 3 extra nodes, and are planning to stripe the OSTs of the
various filesystems over these 3 OSS¹s and retain the 3 MDT¹s on the
original server which will become purely an MGS/MDS.
We are not yet ready to implement automatic failover mechanisms but wish to
be able to manually failover  OST¹s and MDT¹s between servers in the event
of server failure or even for server maintenance.

In preparation for this, I have been testing with a new filesystem  but am
unable to mount an OST on an arbitary server node if it has previously been
mounted on another node.

For example  assuming our lustre MDS is called lustre-mds1 and our 3 OSS¹s
are lustre-oss1, lustre-oss2 and lustre-oss3  - I can create the new
filesystem with an MDT mounted on lustre-mds1 and the OSTs mounted on
lustre-oss1 and a client can successfully mount the filesystem.  When I
unmount the client, then the OSTs and remount them on lustre-oss2  the
client can mount the filesystem but cannot access the files and there is no
metadata available (see below).  There are errors on the MDS also (see
below):

[Client]
[root at node8 ~]# ls -l /mnt/lustre/test
total 0
?--------- ? ? ? ?            ? testfile-node8
[root at node8 ~]#

[MGS/MDS /var/log/messages]:
Jun  2 17:14:14 lustre1 kernel: LustreError:
6481:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading HELLO
from 130.102.xxx.xxx
Jun  2 17:14:14 lustre1 kernel: LustreError:
6481:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Skipped 17 previous similar
messages
Jun  2 17:15:29 lustre1 kernel: LustreError: 11b-b: Connection to
130.102.xxx.xxx at tcp at host 130.102.xxx.xxx on port 988 was reset: is it
running a compatible version of Lustre and is 130.102.xxx.xxx at tcp one of its
NIDs?
Jun  2 17:15:29 lustre1 kernel: LustreError: Skipped 17 previous similar
messages

(Where the IP was that of the OSS currently mounting the OSTs.  All servers
and clients are running 64-bit RHEL5 with lustre 1.6.7.1.)

The same occurs when the OSTs are mounted on lustre-oss3.

However when I remount the OSTs on lustre-oss1, the client can suddenly see
the files again:

[root at node8 ~]# ls -l /mnt/lustre/test
total 4
-rw-r--r-- 1 root root 6 Jun  2 16:29 testfile-node8
[root at node8 ~]# 

It seems that that if I perform a tunfs.lustre ‹writeconf on both the MDT
and the OSTs of the filesystem  then I can remount the OST¹s on a new
server and the client can see them.  Of course, I cannot later mount them on
another server unless I perform the tunefs.lustre/writeconf again.

This behaviour with tunefs.lustre ‹writeconf was not always consistent and
on one occasion the filesystem became unmountable anywhere (no matter which
OSS mounted it, the clients failed to complete the mount)  but when the
filesystem was recreated (ie mkfs.lustre ‹reformat . . . ) the behaviour I
described above is again reproducible (for now).  Note that I could not
reboot the lustre-mds (MDS) server or restart various services on it as it
is currently in production.

Is this as it should be, or is there a better way to be able to failover
LUNs between servers  preferably while keeping the filesystem mounted and
available for clients.

** I did see section 4.2.9 - 4.2.11 in the lustre manual (May 2009) - but I
am not changing any server NIDs, or MGS locations (in fact the MGS is on its
own LUN) and found that merely running the writeconf on the MDT LUN resulted
in errors when mounting the OST¹s on the OSS¹s (see below), and I was hoping
that there was a way to move the OST LUNs dynamically without unmounting all
clients and servers:

Jun  2 16:58:02 lustre1 kernel: LustreError: 13b-9: test2-OST0000 claims to
have registered, but this MGS does not know about it, preventing
registration.

Thanks in advance for your advice you may have or pointers to documentation
I have overlooked.

Regards,

Marcus.

Marcus Schull
Systems Administrator
IMB, University of Queensland.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090602/e1be59f2/attachment.htm>