[Lustre-discuss] Lustre_config fails trying to access mgs - mdt and mgs are configured together.
Alexander, Jack
Jack.Alexander at hp.com
Thu Aug 28 07:27:28 PDT 2008
In my lustre 1.6 config, I have two MSA2000 and two DL380G5 servers. The servers sfs1 and sfs2 are the internal network Ethernet names for my two servers. The system interconnect names, ic-sfs1 and ic-sfs2, correspond to the servers.
I've successfully (I think) run both "lctl ping ic-sfs1 at o2ib" and "lctl ping ic-sfs1 at o2ib" from server sfs1 and sfs2. Does this look correct. How do you read the output from this command?
[root at hpcsfse2 ~]# lctl ping 172.31.97.2 at o2ib0
12345-0 at lo
12345-172.31.97.2 at o2ib
[root at hpcsfse2 ~]# lctl ping 172.31.97.1 at o2ib0
12345-0 at lo
12345-172.31.97.1 at o2ib
This is the .csv file I'm using as input to the lustre_config command. Note that the mdt and mgs components are mounted together.
hpcsfse1:root> cat src/scripts/hpcsfse_lustre_config.csv
sfs1,options lnet networks=o2ib0,/dev/mapper/mpath0,/mnt/mdt_mgs,mdt|mgs,testfs,,,,,_netdev,ic-sfs2 at o2ib0
sfs2,options lnet networks=o2ib0,/dev/mapper/mpath1,/mnt/ost0,ost,testfs,ic-sfs1 at o2ib0,,,,_netdev,ic-sfs1 at o2ib0
Configuration of sfs1 server seems to be OK.
hpcsfse1:root> lustre_config -vfw sfs1 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs1
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs1"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!
lustre_config: ******** Lustre cluster configuration START ********
lustre_config: Explicit MGS target /dev/mapper/mpath0 in host sfs1.
lustre_config: Adding lnet module options to sfs1
lustre_config: Starting lnet network in sfs1
lustre_config: Creating the mount point /mnt/mdt_mgs on sfs1
lustre_config: Formatting Lustre target /dev/mapper/mpath0 on sfs1...
lustre_config: Formatting command line is: ssh -x -q sfs1 "PATH=$PATH:/sbin:/usr/sbin; /usr/sbin/mkfs.lustre --reformat --mgs --mdt --fsname=testfs --failnode=ic-sfs2 at o2ib0 /dev/mapper/mpath0"
lustre_config: Waiting for the return of the remote command...
Permanent disk data:
Target: testfs-MDTffff
Index: unassigned
Lustre FS: testfs
Mount type: ldiskfs
Flags: 0x75
(MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=172.31.97.2 at o2ib mgsnode=172.31.97.2 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups
device size = 1259804MB
2 6 18
formatting backing filesystem ldiskfs on /dev/mapper/mpath0
target name testfs-MDTffff
4k blocks 0
options -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L testfs-MDTffff -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F /dev/mapper/mpath0
Writing CONFIGS/mountdata
lustre_config: Success on all Lustre targets!
lustre_config: Modify /etc/fstab of host sfs1 to add Lustre target /dev/mapper/mpath0
lustre_config: /dev/mapper/mpath0 /mnt/mdt_mgs lustre _netdev 0 0
lustre_config: ******** Lustre cluster configuration END **********
hpcsfse1:root> mount /mnt/mdt_mgs
Configuration of the sfs2 server fails. How do I debug and or correct this?
hpcsfse1:root> lustre_config -vfw sfs2 src/scripts/hpcsfse_lustre_config.csv
lustre_config: Operating on the following nodes: sfs2
lustre_config: Checking the cluster network connectivity and hostnames...
lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs2"...
lc_net: OK
lustre_config: Check the cluster network connectivity and hostnames OK!
lustre_config: ******** Lustre cluster configuration START ********
lustre_config: There is no MGS target in the node list "sfs2".
lustre_config: Creating the mount point /mnt/ost0 on sfs2
lustre_config: Adding lnet module options to sfs2
lustre_config: Starting lnet network in sfs2
lustre_config: Checking lnet connectivity between sfs2 and the MGS node
lustre_config: check_lnet_connect() error: sfs2 cannot contact the MGS node with nids - "ic-sfs1 at o2ib0"! Check /usr/sbin/lctl command!
hpcsfse1:root> lctl dl
0 UP mgs MGS MGS 5
1 UP mgc MGC172.31.97.2 at o2ib c2910fbc-b150-b759-0c41-a5851616e41e 5
2 UP mdt MDS MDS_uuid 3
3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4
4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080828/0e4817f3/attachment.htm>
More information about the lustre-discuss
mailing list