[Lustre-discuss] Problem re-mounting Lustre on an other node

Wed Oct 14 17:50:30 PDT 2009

On 14-Oct-09, at 01:08, Michael Schwartzkopff wrote:
> we have a Lustre 1.8 Cluster with openais and pacemaker as the cluster
> manager. When I migrate one lustre resource from one node to an  
> other node I
> get an error. Stopping lustre on one node is no problem, but the  
> node where
> lustre should start says:
>
> Oct 14 09:54:28 sososd6 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 14 09:54:28 sososd6 kernel: LDISKFS FS on dm-4, internal journal
> Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: recovery complete.
> Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: mounted filesystem with  
> ordered
> data mode.
> Oct 14 09:54:28 sososd6 multipathd: dm-4: umount map (uevent)
> Oct 14 09:54:39 sososd6 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Oct 14 09:54:39 sososd6 kernel: LDISKFS FS on dm-4, internal journal
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mounted filesystem with  
> ordered
> data mode.
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: file extents enabled
> Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mballoc enabled
> Oct 14 09:54:39 sososd6 kernel: Lustre: MGC134.171.16.190 at tcp:  
> Reactivating
> import
> Oct 14 09:54:45 sososd6 kernel: LustreError: 137-5: UUID 'segfs- 
> OST0000_UUID'
> is not available  for connect (no target)

This is likely driven by some client trying to connect to OST0000, but  
I don't
see anything in the above logs that indicate that OST0000 has actually  
started
up yet.  It should have something like:

RECOVERY: service myth-OST0000, 3 recoverable clients, last_rcvd  
17180097556
Lustre: OST myth-OST0000 now serving dev (myth-OST0000/81a23803-0711- 
a534-441a-f5ee34e094a8), but will be in recovery for at least 5:00, or  
until 3 clients reconnect.
Lustre: Server myth-OST0000 on device /dev/mapper/vgmyth-lvmythost0  
has started

> These log continue until the cluster software times out and the  
> resource tells
> me about the error. Any help understanding these logs? Thanks.

Are you sure you are mounting the OSTs with type "lustre" instead of  
"ldiskfs"?
I see the above Lustre messages on my system a few seconds after the  
LDISKFS
messages are printed.

If you are using MMP (which you should be, on an automated failover  
config)
it will add 10-20s of delay to the ldiskfs mount.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.