[lustre-discuss] lustre client mount error "Is the MGS running?"

Osipenko, Oleg - MRP-APHIS Oleg.Osipenko at usda.gov
Mon May 1 14:31:21 PDT 2023


Hello everyone!

I have a lustre client node that is external to our "bright-stack" and is not seen by the bright cluster manager

extern lustre client background info:
        root at aapksmansfi01 ~ # cat /etc/os-release
        NAME="Ubuntu"
        VERSION="20.04.6 LTS (Focal Fossa)"
        ID=ubuntu
        ID_LIKE=debian
        PRETTY_NAME="Ubuntu 20.04.6 LTS"
        VERSION_ID="20.04"
        HOME_URL=https://www.ubuntu.com/
        SUPPORT_URL=https://help.ubuntu.com/
        BUG_REPORT_URL=https://bugs.launchpad.net/ubuntu/
        PRIVACY_POLICY_URL=https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
        VERSION_CODENAME=focal
        UBUNTU_CODENAME=focal
        root at aapksmansfi01 ~ # uname --all
        Linux aapksmansfi01.usda.net 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
        root at aapksmansfi01 ~ # ofed_info | head -n 1
        MLNX_OFED_LINUX-5.4-3.6.8.1 (OFED-5.4-3.6.8):
        root at aapksmansfi01 ~ # lctl lustre_build_version
        Lustre version: 2.14.0_21_gd4b9557
        root at aapksmansfi01 ~ # lctl list_nids
        10.149.255.253 at o2ib<mailto:10.149.255.253 at o2ib>
        10.149.255.253 at tcp<mailto:10.149.255.253 at tcp>
        root at aapksmansfi01 ~ # lnetctl net show
        net:
            - net type: lo
              local NI(s):
                - nid: 0 at lo
                  status: up
            - net type: o2ib
              local NI(s):
                - nid: 10.149.255.253 at o2ib<mailto:10.149.255.253 at o2ib>
                  status: up
                  interfaces:
                      0: ibs1f1
            - net type: tcp
              local NI(s):
                - nid: 10.149.255.253 at tcp<mailto:10.149.255.253 at tcp>
                  status: up
                  interfaces:
                      0: ibs1f1
        root at aapksmansfi01 ~ # lnetctl net show -v
        net:
            - net type: lo
              local NI(s):
                - nid: 0 at lo
                  status: up
                  statistics:
                      send_count: 0
                      recv_count: 0
                      drop_count: 0
                  tunables:
                      peer_timeout: 0
                      peer_credits: 0
                      peer_buffer_credits: 0
                      credits: 0
                 dev cpt: 0
                  tcp bonding: 0
                  CPT: "[0,1]"
            - net type: o2ib
              local NI(s):
                - nid: 10.149.255.253 at o2ib<mailto:10.149.255.253 at o2ib>
                  status: up
                  interfaces:
                      0: ibs1f1
                  statistics:
                      send_count: 0
                      recv_count: 0
                      drop_count: 0
                  tunables:
                      peer_timeout: 180
                      peer_credits: 8
                      peer_buffer_credits: 0
                      credits: 256
                      peercredits_hiw: 4
                      map_on_demand: 1
                      concurrent_sends: 8
                      fmr_pool_size: 512
                      fmr_flush_trigger: 384
                      fmr_cache: 1
                      ntx: 512
                      conns_per_peer: 1
                  lnd tunables:
                  dev cpt: 1
                  tcp bonding: 0
                  CPT: "[0,1]"
            - net type: tcp
              local NI(s):
                - nid: 10.149.255.253 at tcp<mailto:10.149.255.253 at tcp>
                  status: up
                  interfaces:
                      0: ibs1f1
                  statistics:
                      send_count: 0
                      recv_count: 0
                      drop_count: 0
                  tunables:
                      peer_timeout: 180
                      peer_credits: 8
                      peer_buffer_credits: 0
                      credits: 256
                  dev cpt: 1
                  tcp bonding: 0
                  CPT: "[0,1]"


I am still unable to mount lustre while pointing to the mds/mgs node
        root at aapksmansfi01 ~ # mount -t lustre 10.149.0.33 at o2ib:/lustrefs<mailto:10.149.0.33 at o2ib:/lustrefs> /mnt/lustrefs
        mount.lustre: mount 10.149.0.33 at o2ib:/lustrefs<mailto:10.149.0.33 at o2ib:/lustrefs> at /mnt/lustrefs failed: Input/output error
        Is the MGS running?
                all the lustre modules loaded without errors (from what I could tell)

If anyone knows or can suggest a solution or any diagnostic tools for this lustre client I would be very grateful!

Thank You!


[USDA - APHIS]
Oleg Osipenko
IT SPECIALIST - DATA MANAGEMENT
APHIS Marketing and Regulatory Programs Business Services (MRPBS)
MRP Information Technology (MRP IT) Services
Laboratory and Scientific IT Support Services, NBAF Branch
p: 785-844-1946
o: 785-712-3303
oleg.osipenko at usda.gov<mailto:oleg.osipenko at usda.gov>
1880 Kimball Ave.
Manhattan, KS 66502


CONFIDENTIALITY NOTE: The preceding email message contains information that may be confidential, proprietary, or legally privileged, and may constitute non-public information. This message is intended to be conveyed only to the intended named recipient(s). If you are not an intended recipient of this message, do not read it; instead, please advise the sender by reply email, and delete this message and any attachments. Unauthorized individuals or entities are not permitted access to this information. Any disclosure, copying, distribution or taking any action in reliance on the contents of this information, except its delivery to the sender, is strictly prohibited and may be unlawful.





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230501/3d3cdd32/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 3453 bytes
Desc: image001.jpg
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230501/3d3cdd32/attachment-0001.jpg>


More information about the lustre-discuss mailing list