[lustre-discuss] ocf:lustre:Lustre resources not happy when ocf:heartbeat:Filesystem ones are
Scott Wood
woodystrash at hotmail.com
Tue Mar 6 19:40:37 PST 2018
Hi folks,
I've just upgraded a 2.7.0 cluster to 2.10.3 and thought I'd take advantage of the new HA resource agents. Sadly, I find that the resource agent successfully mounts the OSDs, then the resource stops (leaving the OSDs mounted). Here's an example case, the management OSD
Created with the following:
# pcs resource create MGT ocf:lustre:Lustre target=/dev/disk/by-label/MGS mountpoint=/mnt/MGT; pcs constraint location MGT prefers hpctestmds1=100
Results in the following, leaving the resource stopped but the MGT mounted:
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32115]: ERROR: /dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]: notice: Result of probe operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32128]: INFO: Starting to mount /dev/disk/by-label/MGS
Mar 07 13:28:22 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored to 9eb39832-a281-1088-d816-410b918b5813 (at 0 at lo)
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: Skipped 6 previous similar messages
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32173]: INFO: /dev/disk/by-label/MGS mounted successfully
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]: notice: Result of start operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32189]: ERROR: /dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]: notice: Result of stop operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32207]: INFO: Starting to mount /dev/disk/by-label/MGS
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32215]: ERROR: mount failed
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32221]: ERROR: /dev/disk/by-label/MGS can not be mounted with this error: 1
Mar 07 13:28:23 hpctestmds1.our.domain lrmd[11456]: notice: MGT_start_0:32200:stderr [ mount.lustre: according to /etc/mtab /dev/sde is already mounted on /mnt/MGT ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]: notice: Result of start operation for MGT on hpctestmds1: 1 (unknown error)
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]: notice: hpctestmds1-MGT_start_0:558 [ mount.lustre: according to /etc/mtab /dev/sde is already mounted on /mnt/MGT\n ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]: notice: Result of stop operation for MGT on hpctestmds1: 0 (ok)
I then delete the resource, unmount the MGT, and make a new resource with the old ocf:heartbeat:Filesystem agent, setting the options to match the defaults from the ocf:lustre:Lustre agent, as follows:
# pcs resource create MGT Filesystem device=/dev/disk/by-label/MGS directory=/mnt/MGT fstype="lustre" meta op monitor interval="20" timeout="300" op start interval="0" timeout="300" op stop interval="0" timeout="300"; pcs constraint location MGT prefers hpctestmds1=100
This results in a happier resource start. Pacemaker resource stays "Started" and mount persists. From journalctl:
Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]: notice: Result of probe operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:35:07 hpctestmds1.our.domain Filesystem(MGT)[744]: INFO: Running start for /dev/disk/by-label/MGS on /mnt/MGT
Mar 07 13:35:07 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:35:07 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored to 9eb39832-a281-1088-d816-410b918b5813 (at 0 at lo)
Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]: notice: Result of start operation for MGT on hpctestmds1: 0 (ok)
Has anyone experience similar results? Any tips?
Cheers
CanWood
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180307/94a40e7b/attachment.html>
More information about the lustre-discuss
mailing list