[Lustre-discuss] New lustre 1.8.5 over IB problem
Colin Faber
cfaber at gmail.com
Mon Dec 13 11:07:58 PST 2010
On 12/13/2010 11:54 AM, Gary Molenkamp wrote:
> I'm attempting to deploy a new lustre filesystem using lustre 1.8.5, but
> this is my first stab at incorporating an IB network. I've deployed
> several over tcp using 1.8.4 without issue, so I'm not sure if there is
> an IB configuration or a 1.8.5 issue here. Any assistance would be
> appreciated.
>
> This new cluster has two parallel networks:
> gige: 10.27.5.0/23
> ib : 10.27.8.0/23
>
> On the lfs servers and clients, lnet is configured as:
> options lnet networks=o2ib0(ib0),tcp0(ib0)
^^^^^
Why are you assigning two different network types to the same physical
device?
> The IB network is routable to 10/8 and clients mount other lustre
> filesystems using 1.8.4 over tcp.
>
> On the MDS (with an intended failover to a secondary) the mgs,mdt
> filesystem is created with:
>
> mkfs.lustre --fsname lfs --mdt --mgs \
> --mkfsoptions='-i 1024 -I 512' \
> --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \
> --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \
> /dev/sda
>
> However, this mount then fails with:
>
> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign
> requested address
>
> An lctl shows the proper nids:
> 10.27.9.133 at o2ib
> 10.27.9.133 at tcp
>
> Dmesg shows a parsing error with the o2ib0 nid:
>
> LustreError: 159-d: Can't parse NID 'failover.node=10.27.9.133 at o2ib0'
> Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib,
> specified as failover
> LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required
> registration failed for lfs-MDT0000: -99
>
> Am I specifying the failover incorrectly? What should it be when using
> o2ib as the primary interconnect. If I remove the failover parameters
> using tunefs.lustre the mount succeeds, but clients cannot connect to
> the mdt.
>
>
More information about the lustre-discuss
mailing list