[lustre-discuss] MDS/MGS has a block storage device mounted and it does not have any permissions (no read , no write, no execute)

Pinkesh Valdria pinkesh.valdria at oracle.com
Tue Feb 5 14:39:32 PST 2019


Hello All,

 

I am new to Lustre.   I started by using the docs on this page to deploy Lustre on Virtual machines running CentOS 7.x (CentOS-7-2018.08.15-0).    Included below are the content of the scripts I used and the error I get.  

I have not done any setup for “o2ib0(ib0)” and lnet is using tcp.   All the nodes are on the same network & subnet and cannot communicate on my protocol and port #. 

 

Thanks for your help.  I am completely blocked and looking for ideas. (already did google search ☹).  

 

I have 2 questions:  

The MDT mounted on MDS has no permissions (no read , no write, no execute), even for root user on MDS/MGS node.   Is that expected? .   See “MGS/MDS node setup” section for more details on what I did. 

[root at lustre-mds-server-1 opc]# mount -t lustre /dev/sdb /mnt/mdt

 

[root at lustre-mds-server-1 opc]# ll /mnt

total 0

d---------. 1 root root 0 Jan  1  1970 mdt

[root at lustre-mds-server-1 opc]#

Assuming if the above is not an issue,  after setting up OSS/OST and Client node,  When my client tries to mount, I get the below error: 

[root at lustre-client-1 opc]# mount -t lustre 10.0.2.4 at tcp:/lustrewt /mnt

mount.lustre: mount 10.0.2.4 at tcp:/lustrewt at /mnt failed: Input/output error

Is the MGS running?

[root at lustre-client-1 opc]#

 

dmesg shows the below error on the client node:  

[root at lustre-client-1 opc]#  dmesg

[35639.535862] Lustre: 11730:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549386846/real 1549386846]  req at ffff9259bb518c00 x1624614953288208/t0(0) o250->MGC10.0.2.4 at tcp@10.0.2.4 at tcp:26/25 lens 520/544 e 0 to 1 dl 1549386851 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1

[35640.535877] LustreError: 7718:0:(mgc_request.c:251:do_config_log_add()) MGC10.0.2.4 at tcp: failed processing log, type 1: rc = -5

[35669.535028] Lustre: 11730:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549386871/real 1549386871]  req at ffff9259bb428f00 x1624614953288256/t0(0) o250->MGC10.0.2.4 at tcp@10.0.2.4 at tcp:26/25 lens 520/544 e 0 to 1 dl 1549386881 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1

[35670.546671] LustreError: 15c-8: MGC10.0.2.4 at tcp: The configuration from log 'lustrewt-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.

[35670.557472] Lustre: Unmounted lustrewt-client

[35670.560432] LustreError: 7718:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-5)

[root at lustre-client-1 opc]#

 

I have firewall turned off on all nodes (client, mds/mgs, oss),  selinux is disabled/setenforce=0 .  I can telnet to the MDS/MGS node from client machine.  

 

 

Given below is the setup I have on different nodes: 

 

MGS/MDS node setup 

#!/bin/bash

service firewalld stop

chkconfig firewalld off

 

cat > /etc/yum.repos.d/lustre.repo << EOF

[hpddLustreserver]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7.6.1810/server/

gpgcheck=0

 

[e2fsprogs]

name=CentOS- - Ldiskfs

baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/

gpgcheck=0

 

[hpddLustreclient]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7.6.1810/client/

gpgcheck=0

EOF

 

sudo yum install lustre-tests -y

 

cp /etc/selinux/config /etc/selinux/config.backup

sed 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

 

setenforce 0

 

echo "complete.  rebooting now"

reboot

 

 

 

After reboot is complete,  I login to the MGS/MDS node as root and run the following steps: 

 

The node has a block storage device attached:  /dev/sdb

Run the below command: 

pvcreate -y  /dev/sdb

mkfs.xfs -f /dev/sdb

 

 

[root at lustre-mds-server-1 opc]# setenforce 0

[root at lustre-mds-server-1 opc]# mkfs.lustre --fsname=lustrewt --index=0 --mgs --mdt /dev/sdb

   Permanent disk data:

Target:     lustrewt:MDT0000

Index:      0

Lustre FS:  lustrewt

Mount type: ldiskfs

Flags:      0x65

              (MDT MGS first_time update )

Persistent mount opts: user_xattr,errors=remount-ro

Parameters:

 

checking for existing Lustre data: not found

device size = 51200MB

formatting backing filesystem ldiskfs on /dev/sdb

        target name   lustrewt:MDT0000

        4k blocks     13107200

        options        -J size=2048 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F

mkfs_cmd = mke2fs -j -b 4096 -L lustrewt:MDT0000  -J size=2048 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb 13107200

 

 

[root at lustre-mds-server-1 opc]# mkdir -p /mnt/mdt

[root at lustre-mds-server-1 opc]# mount -t lustre /dev/sdb /mnt/mdt

[root at lustre-mds-server-1 opc]# modprobe lnet

[root at lustre-mds-server-1 opc]# lctl network up

LNET configured

[root at lustre-mds-server-1 opc]# lctl list_nids

10.0.2.4 at tcp

 

[root at lustre-mds-server-1 opc]# ll /mnt

total 0

d---------. 1 root root 0 Jan  1  1970 mdt

[root at lustre-mds-server-1 opc]#

 

 

OSS/OST node

1 OSS node with 1 block device for OST (/dev/sdb). The setup to update kernel was the same as MGS/MDS node (described above),  then I ran the below commands: 

 

 

mkfs.lustre --ost --fsname=lustrewt --index=0 --mgsnode=10.0.2.4 at tcp /dev/sdb

mkdir -p /ostoss_mount

mount -t lustre /dev/sdb /ostoss_mount

 

 

Client  node

1 client node. The setup to update kernel was the same as MGS/MDS node (described above),  then I ran the below commands: 

 

[root at lustre-client-1 opc]# modprobe lustre

[root at lustre-client-1 opc]# mount -t lustre 10.0.2.3 at tcp:/lustrewt /mnt   (This fails with below error):

mount.lustre: mount 10.0.2.4 at tcp:/lustrewt at /mnt failed: Input/output error

Is the MGS running?

[root at lustre-client-1 opc]#

 

 

 

 

Thanks,

Pinkesh Valdria

OCI – Big Data

Principal Solutions Architect 

m: +1-206-234-4314

HYPERLINK "mailto:pinkesh.valdria at oracle.com"pinkesh.valdria at oracle.com

 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190205/c7d64f1d/attachment.html>


More information about the lustre-discuss mailing list