[lustre-discuss] ocf:lustre:Lustre resources not happy when ocf:heartbeat:Filesystem ones are

Scott Wood woodystrash at hotmail.com
Tue Mar 6 19:40:37 PST 2018


Hi folks,


I've just upgraded a 2.7.0 cluster to 2.10.3 and thought I'd take advantage of the new HA resource agents.  Sadly, I find that the resource agent successfully mounts the OSDs, then the resource stops (leaving the OSDs mounted).  Here's an example case, the management OSD


Created with the following:

# pcs resource create MGT ocf:lustre:Lustre target=/dev/disk/by-label/MGS mountpoint=/mnt/MGT; pcs constraint location MGT prefers hpctestmds1=100


Results in the following, leaving the resource stopped but the MGT mounted:

Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32115]: ERROR: /dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of probe operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32128]: INFO: Starting to mount /dev/disk/by-label/MGS
Mar 07 13:28:22 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored to 9eb39832-a281-1088-d816-410b918b5813 (at 0 at lo)
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: Skipped 6 previous similar messages
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32173]: INFO: /dev/disk/by-label/MGS mounted successfully
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of start operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32189]: ERROR: /dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of stop operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32207]: INFO: Starting to mount /dev/disk/by-label/MGS
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32215]: ERROR:  mount failed
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32221]: ERROR: /dev/disk/by-label/MGS can not be mounted with this error: 1
Mar 07 13:28:23 hpctestmds1.our.domain lrmd[11456]:   notice: MGT_start_0:32200:stderr [ mount.lustre: according to /etc/mtab /dev/sde is already mounted on /mnt/MGT ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: Result of start operation for MGT on hpctestmds1: 1 (unknown error)
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: hpctestmds1-MGT_start_0:558 [ mount.lustre: according to /etc/mtab /dev/sde is already mounted on /mnt/MGT\n ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: Result of stop operation for MGT on hpctestmds1: 0 (ok)

I then delete the resource, unmount the MGT, and make a new resource with the old ocf:heartbeat:Filesystem agent, setting the options to match the defaults from the ocf:lustre:Lustre agent, as follows:

# pcs resource create MGT Filesystem device=/dev/disk/by-label/MGS directory=/mnt/MGT fstype="lustre" meta op monitor interval="20" timeout="300" op start interval="0" timeout="300" op stop interval="0" timeout="300"; pcs constraint location MGT prefers hpctestmds1=100


This results in a happier resource start.  Pacemaker resource stays "Started" and mount persists.  From journalctl:

Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]:   notice: Result of probe operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:35:07 hpctestmds1.our.domain Filesystem(MGT)[744]: INFO: Running start for /dev/disk/by-label/MGS on /mnt/MGT
Mar 07 13:35:07 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:35:07 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored to 9eb39832-a281-1088-d816-410b918b5813 (at 0 at lo)
Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]:   notice: Result of start operation for MGT on hpctestmds1: 0 (ok)

Has anyone experience similar results? Any tips?

Cheers
CanWood


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180307/94a40e7b/attachment.html>


More information about the lustre-discuss mailing list