[Lustre-discuss] Problem re-mounting Lustre on an other node
Andreas Dilger
adilger at sun.com
Wed Oct 14 17:50:30 PDT 2009
On 14-Oct-09, at 01:08, Michael Schwartzkopff wrote:
> we have a Lustre 1.8 Cluster with openais and pacemaker as the cluster
> manager. When I migrate one lustre resource from one node to an
> other node I
> get an error. Stopping lustre on one node is no problem, but the
> node where
> lustre should start says:
>
> Oct 14 09:54:28 sososd6 kernel: kjournald starting. Commit interval
> 5 seconds
> Oct 14 09:54:28 sososd6 kernel: LDISKFS FS on dm-4, internal journal
> Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: recovery complete.
> Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: mounted filesystem with
> ordered
> data mode.
> Oct 14 09:54:28 sososd6 multipathd: dm-4: umount map (uevent)
> Oct 14 09:54:39 sososd6 kernel: kjournald starting. Commit interval
> 5 seconds
> Oct 14 09:54:39 sososd6 kernel: LDISKFS FS on dm-4, internal journal
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mounted filesystem with
> ordered
> data mode.
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: file extents enabled
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mballoc enabled
> Oct 14 09:54:39 sososd6 kernel: Lustre: MGC134.171.16.190 at tcp:
> Reactivating
> import
> Oct 14 09:54:45 sososd6 kernel: LustreError: 137-5: UUID 'segfs-
> OST0000_UUID'
> is not available for connect (no target)
This is likely driven by some client trying to connect to OST0000, but
I don't
see anything in the above logs that indicate that OST0000 has actually
started
up yet. It should have something like:
RECOVERY: service myth-OST0000, 3 recoverable clients, last_rcvd
17180097556
Lustre: OST myth-OST0000 now serving dev (myth-OST0000/81a23803-0711-
a534-441a-f5ee34e094a8), but will be in recovery for at least 5:00, or
until 3 clients reconnect.
Lustre: Server myth-OST0000 on device /dev/mapper/vgmyth-lvmythost0
has started
> These log continue until the cluster software times out and the
> resource tells
> me about the error. Any help understanding these logs? Thanks.
Are you sure you are mounting the OSTs with type "lustre" instead of
"ldiskfs"?
I see the above Lustre messages on my system a few seconds after the
LDISKFS
messages are printed.
If you are using MMP (which you should be, on an automated failover
config)
it will add 10-20s of delay to the ldiskfs mount.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list