[Lustre-discuss] New lustre 1.8.5 over IB problem
Gary Molenkamp
gary at sharcnet.ca
Mon Dec 13 10:54:46 PST 2010
I'm attempting to deploy a new lustre filesystem using lustre 1.8.5, but
this is my first stab at incorporating an IB network. I've deployed
several over tcp using 1.8.4 without issue, so I'm not sure if there is
an IB configuration or a 1.8.5 issue here. Any assistance would be
appreciated.
This new cluster has two parallel networks:
gige: 10.27.5.0/23
ib : 10.27.8.0/23
On the lfs servers and clients, lnet is configured as:
options lnet networks=o2ib0(ib0),tcp0(ib0)
The IB network is routable to 10/8 and clients mount other lustre
filesystems using 1.8.4 over tcp.
On the MDS (with an intended failover to a secondary) the mgs,mdt
filesystem is created with:
mkfs.lustre --fsname lfs --mdt --mgs \
--mkfsoptions='-i 1024 -I 512' \
--failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \
--mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \
/dev/sda
However, this mount then fails with:
mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign
requested address
An lctl shows the proper nids:
10.27.9.133 at o2ib
10.27.9.133 at tcp
Dmesg shows a parsing error with the o2ib0 nid:
LustreError: 159-d: Can't parse NID 'failover.node=10.27.9.133 at o2ib0'
Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib,
specified as failover
LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required
registration failed for lfs-MDT0000: -99
Am I specifying the failover incorrectly? What should it be when using
o2ib as the primary interconnect. If I remove the failover parameters
using tunefs.lustre the mount succeeds, but clients cannot connect to
the mdt.
--
Gary Molenkamp SHARCNET
Systems Administrator University of Western Ontario
Compute/Calcul Canada http://www.computecanada.org
gary at sharcnet.ca http://www.sharcnet.ca
(519) 661-2111 x88429 (519) 661-4000
More information about the lustre-discuss
mailing list