[Lustre-discuss] Lustre_config fails trying to access mgs - mdt and mgs are configured together.

Alexander, Jack Jack.Alexander at hp.com
Thu Aug 28 07:27:28 PDT 2008


In my lustre 1.6 config, I have two MSA2000 and two DL380G5 servers.  The servers sfs1 and sfs2 are the internal network Ethernet names for my two servers. The system interconnect names, ic-sfs1 and ic-sfs2, correspond to the servers.

I've successfully (I think) run both "lctl ping ic-sfs1 at o2ib" and "lctl ping ic-sfs1 at o2ib" from server sfs1 and sfs2. Does this look correct. How do you read the output from this command?
[root at hpcsfse2 ~]# lctl ping 172.31.97.2 at o2ib0
12345-0 at lo
12345-172.31.97.2 at o2ib
[root at hpcsfse2 ~]# lctl ping 172.31.97.1 at o2ib0
12345-0 at lo
12345-172.31.97.1 at o2ib

This is the .csv file I'm using as input to the lustre_config command. Note that the mdt and mgs components are mounted together.
hpcsfse1:root> cat src/scripts/hpcsfse_lustre_config.csv
sfs1,options lnet networks=o2ib0,/dev/mapper/mpath0,/mnt/mdt_mgs,mdt|mgs,testfs,,,,,_netdev,ic-sfs2 at o2ib0
sfs2,options lnet networks=o2ib0,/dev/mapper/mpath1,/mnt/ost0,ost,testfs,ic-sfs1 at o2ib0,,,,_netdev,ic-sfs1 at o2ib0

Configuration of sfs1 server seems to be OK.
hpcsfse1:root> lustre_config -vfw sfs1 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs1
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs1"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!

lustre_config: ******** Lustre cluster configuration START ********
lustre_config: Explicit MGS target /dev/mapper/mpath0 in host sfs1.
lustre_config: Adding lnet module options to sfs1
lustre_config: Starting lnet network in sfs1
lustre_config: Creating the mount point /mnt/mdt_mgs on sfs1
lustre_config: Formatting Lustre target /dev/mapper/mpath0 on sfs1...
lustre_config: Formatting command line is: ssh -x -q sfs1 "PATH=$PATH:/sbin:/usr/sbin; /usr/sbin/mkfs.lustre --reformat  --mgs --mdt --fsname=testfs --failnode=ic-sfs2 at o2ib0 /dev/mapper/mpath0"
lustre_config: Waiting for the return of the remote command...

   Permanent disk data:
Target:     testfs-MDTffff
Index:      unassigned
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x75
              (MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.31.97.2 at o2ib mgsnode=172.31.97.2 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups

device size = 1259804MB
2 6 18
formatting backing filesystem ldiskfs on /dev/mapper/mpath0
        target name  testfs-MDTffff
        4k blocks     0
        options        -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L testfs-MDTffff  -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F /dev/mapper/mpath0
Writing CONFIGS/mountdata
lustre_config: Success on all Lustre targets!
lustre_config: Modify /etc/fstab of host sfs1 to add Lustre target /dev/mapper/mpath0
lustre_config: /dev/mapper/mpath0               /mnt/mdt_mgs            lustre  _netdev 0 0
lustre_config: ******** Lustre cluster configuration END **********

hpcsfse1:root> mount /mnt/mdt_mgs

Configuration of the sfs2 server fails. How do I debug and or correct this?
hpcsfse1:root> lustre_config -vfw sfs2 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs2
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs2"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!

lustre_config: ******** Lustre cluster configuration START ********
lustre_config: There is no MGS target in the node list "sfs2".
lustre_config: Creating the mount point /mnt/ost0 on sfs2
lustre_config: Adding lnet module options to sfs2
lustre_config: Starting lnet network in sfs2
lustre_config: Checking lnet connectivity between sfs2 and the MGS node
lustre_config: check_lnet_connect() error: sfs2 cannot contact the MGS node  with nids - "ic-sfs1 at o2ib0"! Check /usr/sbin/lctl command!

hpcsfse1:root> lctl dl
  0 UP mgs MGS MGS 5
  1 UP mgc MGC172.31.97.2 at o2ib c2910fbc-b150-b759-0c41-a5851616e41e 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
  4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 3



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080828/0e4817f3/attachment.htm>


More information about the lustre-discuss mailing list