[Lustre-discuss] New lustre 1.8.5 over IB problem

Kaizaad Bilimorya kaizaad at sharcnet.ca
Tue Dec 14 07:32:57 PST 2010


On Mon, 13 Dec 2010, Colin Faber wrote:

> On 12/13/2010 11:54 AM, Gary Molenkamp wrote:
>> I'm attempting to deploy a new lustre filesystem using lustre 1.8.5, but
>> this is my first stab at incorporating an IB network.  I've deployed
>> several over tcp using 1.8.4 without issue, so I'm not sure if there is
>> an IB configuration or a 1.8.5 issue here. Any assistance would be
>> appreciated.
>>
>> This new cluster has two parallel networks:
>>     gige:  10.27.5.0/23
>>     ib  :  10.27.8.0/23
>>
>> On the lfs servers and clients, lnet is configured as:
>>     options lnet networks=o2ib0(ib0),tcp0(ib0)
>                                                                      ^^^^^
> Why are you assigning two different network types to the same physical
> device?

Hello Colin,

Thanks for the reply. In answer to your question:

The same physical device has access to two different lustre filesystems 
using different protocols.

One lustre filesystem is locally available via the native ib interface 
o2ib0(ib0).

The other lustre filesystem is remotely available (via a IB to 10Gb 
switch/gateway in the local IB fabric) on the same local IB device but 
only via the tcp/ip (IPoIB) protocol, tcp0(ib0).

(not sure how good this ASCII diagram will look)

 				     ---------------------
     |-------------|	|------------| local lustre setup|
ib0 |		-----------	     ---------------------
--------	|ib fabric|
|client|	-----------
--------	 |
 		--------------
 		|ib to 10Gb gw|
 		--------------
 		 |		   eth0	--------------------
 		 |---------------------| remote lustre setup|
 					--------------------

Is this possible?

-k

>> The IB network is routable to 10/8 and clients mount other lustre
>> filesystems using 1.8.4 over tcp.
>>
>> On the MDS (with an intended failover to a secondary) the mgs,mdt
>> filesystem is created with:
>>
>>   mkfs.lustre --fsname lfs --mdt --mgs \
>> 	--mkfsoptions='-i 1024 -I 512' \
>> 	--failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0  \
>> 	--mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \
>> 	/dev/sda
>>
>> However, this mount then fails with:
>>
>> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign
>> requested address
>>
>> An lctl shows the proper nids:
>>   10.27.9.133 at o2ib
>>   10.27.9.133 at tcp
>>
>> Dmesg shows a parsing error with the o2ib0 nid:
>>
>> LustreError: 159-d: Can't parse NID 'failover.node=10.27.9.133 at o2ib0'
>> Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib,
>> specified as failover
>> LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required
>> registration failed for lfs-MDT0000: -99
>>
>> Am I specifying the failover incorrectly?  What should it be when using
>> o2ib as the primary interconnect.  If I remove the failover parameters
>> using tunefs.lustre the mount succeeds,  but clients cannot connect to
>> the mdt.
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list