[lustre-discuss] Lustre and server upgrade

Wed Nov 24 19:34:33 PST 2021

what does tune2fs report for /dev/sdb on the MDS?

(Also sorry, this somehow got lost in my inbox)

On Mon, Nov 22, 2021 at 8:57 AM STEPHENS, DEAN - US <dean.stephens at caci.com>
wrote:

> Colin and Andreas, so to clarify some points for you, This is what I am
> seeing:
>
>
>
> Rpm -qa | grep lustre
>
> Kmod_lustre-2.12.6-1.el7.x86_64
>
> Lustre-iokit-2.12.6-1.el7.x86_64
>
> Lustre_test-2.12.6-1.el7.x86_64
>
> Kernel-devel-3.10.0-1160.2.el7_lustre.x86_64
>
> Lustre-osd-ldiskfs-2.12.6-1.el7.x86_64
>
> Kmod-lustre-osd-ldiskfs-2.12.6-1.el7.x86_64
>
> Kmod-lustre-tests-2.12.6-1.el7.x86_64
>
> lustre-resource-agents-2.12.6-1.el7.x86_64
>
> kernel-3.10.0-1160.2.el7_lustre.x86_64
>
> lustre-2.12.6-1.el7.x86_64
>
>
>
> rpm -qa | grep e2fs
>
> e2fsprogs-libs-1.45.6.wc1-0.el7.x86_64
>
> e2fsprogs-1.45.6.wc1-0.el7.x86_64
>
>
>
> With all of that installed and the successful running and clean up of the
> llmount.sh and llmountcleanup.sh I am still getting the errors:
>
> “Unable to mount /dev/sdb: Invalid argument”
>
> “tunefs.luster: FATAL: failed to write local files and tunefs.luster:
> exiting with 22 (Invalid argument)”
>
>
>
> When I use the command tunefs.lustre /dev/sdb (which is one of the lustre
> LUNs that is attached as a “disk” to the VM)
>
>
>
> Full output of the tunefs.luster /dev/sdb command (as mush as I can show
> anyway):
>
>
>
> Tunefs.lustre /dev/sdb
>
> Checking for existing lustre data: found
>
> Reading CONFIGS/mountdata
>
>
>
>      Read previous values:
>
> Target:                  <name>-OST0009
>
> Index:                   9
>
> Luster FS:             <name>
>
> Mount type:       ldiskfs
>
> Flags:                     0x1002
>
>                                (OST no_primmode )
>
> Persistent mount opts: errors=remount-ro
>
> Parameters: mgsnode=<IP of the 1st MGS node>@tcp mgsnode=<IP of the 2nd
> MGS node>@tcp failover.node=<IP of the 1st OSS node>@tcp
> failover.node=<IP of the 2nd OSS node>@tcp
>
>
>
>      Permanent disk data:
>
> Target:                  <name>-OST0009
>
> Index:                   9
>
> Luster FS:             <name>
>
> Mount type:       ldiskfs
>
> Flags:                     0x1002
>
>                                (OST no_primmode )
>
> Persistent mount opts: errors=remount-ro
>
> Parameters: mgsnode=<IP of the 1st MGS node>@tcp mgsnode=<IP of the 2nd
> MGS node>@tcp failover.node=<IP of the 1st OSS node>@tcp
> failover.node=<IP of the 2nd OSS node>@tcp
>
>
>
> tunefs.luster: Unable to mount /dev/sdb: Invalid argument
>
>
>
> tunefs.luster: FATAL: failed to write local files
>
> tunefs.luster: exiting with 22 (Invalid argument*)*
>
>
>
> Now to be clear the MDS nodes are not working correctly as I am not able
> to mount /dev/sdb on them where the existing meta data is served out from.
> To this point I have been concentrating on the OSS nodes as that is where
> the lustre data is coming from. I have installed the lustre kernel and the
> same software on the MDS nodes in the same way that I have on the OSS
> nodes. When I try to use tunefs.lustre /dev/sdb on the MDS nodes I get an
> error saying:
>
>
>
> Checking for existing lustre data: not found
>
>
>
> tunefs.luster: FATAL: device /dev/sdb has not been formatted with
> mkfs.lustre
>
> tunefs.luster: exiting with 19 (no such device*)*
>
>
>
> I am assuming that this is correct as that attached LUN does not need to
> have lustre data on it as it is the meta data server. Is there anything
> that I can/need to check on the MDS nodes to see what is running/working
> correctly?
>
>
>
> I know that this is a lot and I appreciate any help that you can give me
> to troubleshoot this.
>
>
> Dean
>
>
>
>
>
>
>
> *From:* STEPHENS, DEAN - US
> *Sent:* Monday, November 22, 2021 5:58 AM
> *To:* Andreas Dilger <adilger at whamcloud.com>
> *Cc:* Colin Faber <cfaber at gmail.com>; lustre-discuss at lists.lustre.org
> *Subject:* RE: [lustre-discuss] Lustre and server upgrade
>
>
>
> Thanks for the clarification. I am using llmount.sh to test the install of
> the OST and MDT not run in production. I hope to have more done today and
> will reach out to let you all know what I find.
>
>
>
> Dean
>
>
>
> *From:* Andreas Dilger <adilger at whamcloud.com>
> *Sent:* Friday, November 19, 2021 5:25 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* Colin Faber <cfaber at gmail.com>; lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> Dean,
>
> it should be emphasized that "llmount.sh" and "llmountcleanup.sh" are for
> quickly formatting and mounting *TEST* filesystems.  They only create a few
> small (400MB) loopback files in /tmp and format them as OSTs and MDTs.
> This should *NOT* be used on a production system, or you will be very sad
> when the files in /tmp disappear after the server is rebooted and/or they
> reformat your real filesystem devices.
>
>
>
> I mention this here because it isn't clear to me whether you are using
> them for testing, or trying to get a real filesystem mounted.
>
>
>
> Cheers, Andreas
>
>
>
> On Nov 19, 2021, at 13:25, STEPHENS, DEAN - US via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
>
>
> I also figure out how to clean up after the llmount.sh script is run.
> There is a llmountcleanup.sh that will do that.
>
>
>
> Dean
>
>
>
> *From:* STEPHENS, DEAN - US
> *Sent:* Friday, November 19, 2021 1:08 PM
> *To:* Colin Faber <cfaber at gmail.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* RE: [lustre-discuss] Lustre and server upgrade
>
>
>
> One more thing that I have noticed using the llmount.sh script, the
> directories that were created by the script under /mnt have 000 set for the
> permissions. The ones that I have configure under /mnt/lustre are set to
> 750 permissions.
>
>
>
> Is this something that needs to be fixed. I have these server being
> configure via puppet and that is how the /mnt/lustre directories are being
> created and the permissions set.
>
>
>
> Dean
>
>
>
>
>
> *From:* STEPHENS, DEAN - US
> *Sent:* Friday, November 19, 2021 7:14 AM
> *To:* Colin Faber <cfaber at gmail.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* RE: [lustre-discuss] Lustre and server upgrade
>
>
>
> The other question that I have is how to clean up after the llmount.sh has
> been run? If I do a df on the server I see that mds1, osd1, and ost2 are
> still mounted to /mnt. Do I need to manually umount them since the
> llmount.sh completed successfully?
>
>
>
> Also I have not done anything to my MDS node so some direction on what to
> do there will be helpful as well.
>
>
>
> Dean
>
>
>
> *From:* STEPHENS, DEAN - US
> *Sent:* Friday, November 19, 2021 7:00 AM
> *To:* Colin Faber <cfaber at gmail.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* RE: [lustre-discuss] Lustre and server upgrade
>
>
>
> Thanks for the help yesterday and I was able to install the Lustre kernel
> and software on a VM to include the test RPM.
>
>
>
> This is what I did following these directions
> <https://wiki.lustre.org/Installing_the_Lustre_Software#Lustre_Servers_with_LDISKFS_OSD_Support>
> :
>
> Installed the Lustre kernel and kernel-devel (the other RPMs listed were
> not in my luster-server repo)
>
> Rebooted the VM
>
> Installed kmod-lustre kmod-lustre-osd-ldiskfs lustre-osd-ldiskfs-mount
> lustre lustre-resource-agents lustre-tests
>
> Ran modprobe -v lustre (did not show that it loaded kernel modules as it
> has done in the past)
>
> Ran lustre_rmmod (got an error Module Luster in use)
>
> Rebooted again
>
> Ran llmount.sh and it looked like it completed successfully
>
> Ran tunefs.lustre /dev/sdb (at the bottom of the output I am seeing
> tunefs.luster: Unable to mount /dev/sdb: Invalid argument and
> tunefs.luster: FATAL: failed to write local files and tunefs.luster:
> exiting with 22 (Invalid argument))
>
>
>
> Any idea what the “invalid argument” is talking about?
>
>
>
> Dean
>
>
>
> *From:* Colin Faber <cfaber at gmail.com>
> *Sent:* Thursday, November 18, 2021 3:34 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> The VM will need a full install of all server packages, as well as the
> tests package to allow for this test.
>
>
>
> On Thu, Nov 18, 2021 at 2:26 PM STEPHENS, DEAN - US <
> dean.stephens at caci.com> wrote:
>
> I have not tried that but I can do that on a new VM that I can create. I
> assume that is all that I need is the lustre-tests RPM and associated
> dependencies and not the full blown lustre install?
>
>
>
> Dean
>
>
>
> *From:* Colin Faber <cfaber at gmail.com>
> *Sent:* Thursday, November 18, 2021 2:22 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> So that indicates that your installation is incomplete or something else
> is preventing lustre, ldiskfs, and possibly other modules from loading.
> Have you been able to reproduce this behavior on a fresh rhel install with
> lustre 2.12.7? (i.e. llmount.sh failing)?
>
>
>
> -cf
>
>
>
>
>
> On Thu, Nov 18, 2021 at 2:20 PM STEPHENS, DEAN - US <
> dean.stephens at caci.com> wrote:
>
> Thanks for the direction. I found it and installed lustre-tests.x86_64 and
> now I have the llmount but it was defaulted to
> /usr/lib64/lustre/tests/llmount.sh and when I ran it but it failed with:
>
>
>
> Stopping clients: <hostname> /mnt/lustre (opts: -f)
>
> Stopping clients: <hostname> /mnt/lustre2 (opts: -f)
>
> Loading modules from /usr/lib64/lustre/tests/..
>
> Detected 2 online CPUs by sysfs
>
> Force libcfs to create 2 CPU partitions
>
> Formatting mgs, mds, osts
>
> Format mds1: /tmp/lustre-mdt1
>
> Mkfs.lustre: Unable to mount /dev/loop0: No such device (even though
> /dev/loop0 is a thing)
> Is the ldiskfs module loaded?
>
>
>
> Mkfs.lustre FATAL: failed to write local files
>
> Mkfs.lustre: exiting with 19 (no such device)
>
>
>
> *From:* Colin Faber <cfaber at gmail.com>
> *Sent:* Thursday, November 18, 2021 2:03 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> This would be part of the lustre-tests RPM package and will install
> llmount.sh to /usr/lib/lustre/tests/llmount.sh I believe.
>
>
>
> On Thu, Nov 18, 2021 at 1:45 PM STEPHENS, DEAN - US <
> dean.stephens at caci.com> wrote:
>
> Not sure what you mean by “If you install the test suite”. I am not seeing
> a llmount.sh file on the server using “locate llmount.sh” at this point.
> What are the steps to install the test suite?
>
>
>
> Dean
>
>
>
> *From:* Colin Faber <cfaber at gmail.com>
> *Sent:* Thursday, November 18, 2021 1:34 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> Hm.. If you install the test suite does llmount.sh succeed? This should
> setup a single node cluster on whatever node you're running lustre on, I
> believe it will load modules as needed (IIRC), if this test succeeds, then
> you know that lustre is installed correctly (or correctly enough), if not,
> I'd focus on the installation as the target issue may be a redheirring
>
>
>
> -cf
>
>
>
>
>
> On Thu, Nov 18, 2021 at 1:01 PM STEPHENS, DEAN - US <
> dean.stephens at caci.com> wrote:
>
> Thanks for the fast reply.
>
> When I do the tunefs.lustre /dev/sdX command I get:
>
> Target: <name>-OST0009
>
> Index: 9
>
>
>
> Target: <name>-OST0008
>
> Index: 8
>
> I spot checked some others and they seem to be good with the exception of
> one. It shows:
>
>
>
> Target: <name>-OST000a
>
> Index: 10
>
>
>
> But since there are 11 LUNs attached that make sense to me.
>
>
>
> As far as the upgrade it was a fresh install using the legacy targets as
> the OSS and MDS nodes are virtual machine with the LUN disks attached to
> them so that Red Hat sees them as /dev/sdX devices.
>
>
>
> When I loaded Lustre on the server I did a yum install lustre and since we
> were pointed at the lustre-2.12 repo in our environment it picked up the
> following RPMs to install:
>
> Luster-resource-agents-2.12.6-1.el7.x86_64
>
> Kmod-lustre-2.12.6-1.el7.x86_64
>
> Kmod-zfs-3.10.0-1160.2.1.el7_lustre.x86_64-09.7.13-1.el7.x86_64
>
> Kmod-lustre-osd-zfs-2.12.6-1.el7.x86_64
>
> Lustre-2.12.6-1.el7.x86_64
>
> Kmod-spl-3.10.0-1160.2.1.el7_lustre.x86_64-09.7.13-1.el7.x86_64
>
> Lustre-osd-zfs-mount-2.12.6-1.el7.x86_64
>
> Lustre-osd-ldiskfs-mount-2.12.6-1.el7.x86_64
>
>
>
> Dean
>
>
>
> *From:* Colin Faber <cfaber at gmail.com>
> *Sent:* Thursday, November 18, 2021 12:35 PM
> *To:* STEPHENS, DEAN - US <dean.stephens at caci.com>
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] Lustre and server upgrade
>
>
>
> EXTERNAL EMAIL - This email originated from outside of CACI. Do not click
> any links or attachments unless you recognize and trust the sender.
>
>
>
>
>
> Hi,
>
>
>
> I believe in 2.10 sometime (someone correct me if I'm wrong) that the
> index parameter was required and needs to be specified. On an existing
> system this should already be set, but can you check the parameters line
> with tunefs.lustre for correct index=N values across your storage nodes?
>
>
>
> Also, with your "upgrade", was this a fresh install utilizing legacy
> targets?
>
>
>
> The last thing I can think of IIRC, there was on-disk format changes
> between 2.5 and 2.12, these should be transparent to you, but it may be
> some other issue is preventing successful upgrade, though the missing
> module error really speaks to possible issues around how lustre was
> installed and loaded on the system.
>
>
>
> Cheers!
>
>
>
> -cf
>
>
>
>
>
> On Thu, Nov 18, 2021 at 12:24 PM STEPHENS, DEAN - US via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
> I am by no means a Lustre expert and am seeking some help with our system.
> I am not able to get log file to post as the servers are in the closed area
> with no access to the Internet.
>
>
>
> Here is a bit of history of our system:
>
> The OSS and MDS nodes were RHEL6 and running a Luster server the kernel
> 2.6.32-431.23.3.el6_lustre.x86_64 and the Lustre version of 2.5.3. the
> client version was 2.10. That was in a working state.
>
> We upgraded the OSS ad MDS nodes to RHEL7 and installed Lustre server 2.12
> software and kernel.
>
> The attached 11 LUNs are showing up as /dev/sdb - /dev/sdl
>
> Right now, on the OSS nodes, if I use the command tunefs.luster /dev/sdb I
> get some data back saying that Lustre data has been found but at the bottom
> of the out put it shows “tunefs.lustre: Unable to mount /dev/sdb: No such
> device” and “Is the ldiskfs module available”
>
> When I do a “modprobe -v lustre” I do not see ldiskfs.ko as being loaded
> even though there is a ldiskfs.ko file in
> /lib/modules/3.10.0-1160.2.1.el7_lustre.x86_64/extra/lustre/fs directory. I
> am not sure how to get it to load in the modprobe command.
>
> I used “insmod
> /lib/modules/3.10.0-1160.2.1.el7_lustre.x86_64/extra/lustre/fs/ ldiskfs.ko”
> and re-ran the “tunefs.luster /dev/sdb” command with the same result.
>
> If I use the same command on the MDS nodes I get “no Lustre data found and
> /dev/sdb has not been formatted with mkfs.lustre”. I am not sure that is
> what is needed here as the MDS nodes do not really have the lustre data as
> it is the meta data server.
>
> I tried to use the command “tunefs.lustre --mgs --erase_params
> --mgsnode=<IP address>@tcp --writeconf --dryrun /dev/sdb” and get the error
> “/dev/sdb has not been formatted with mkfs.lustre”.
>
>
>
> I need some help and guidance and I can provide what may be needed though
> it will need to be typed out as I am not able to get actual log files from
> the system.
>
>
>
> Dean Stephens
>
> CACI
>
> Linux System Admin
>
>
>
>
> ------------------------------
>
>
> This electronic message contains information from CACI International Inc
> or subsidiary companies, which may be company sensitive, proprietary,
> privileged or otherwise protected from disclosure. The information is
> intended to be used solely by the recipient(s) named above. If you are not
> an intended recipient, be aware that any review, disclosure, copying,
> distribution or use of this transmission or its contents is prohibited. If
> you have received this transmission in error, please notify the sender
> immediately.
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> Cheers, Andreas
>
> --
>
> Andreas Dilger
>
> Lustre Principal Architect
>
> Whamcloud
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211124/70f828c4/attachment-0001.html>