[Lustre-discuss] controlling which eth interface lustre uses

Joe Landman landman at scalableinformatics.com
Thu Oct 21 07:41:24 PDT 2010


On 10/21/2010 10:29 AM, Brock Palen wrote:
>
>
>> Why do you need both active?  If one is a backup to the other, then
>> bond them as a primary/backup pair, meaning only one will be active
>> at at a time, ie, your designated primary (unless it goes down).
>
> We could do this, the 10Gb drivers have been such a pain for us we
> wanted to have a 'back door' management network to get to the box
> should we have issues with the 10Gb driver.
>
> Oddly I ran:
>
> ifconfig eth0 down
>
> and I could nolonger ping the box over the eth4 interface, I had to
> power cycle it form management.  Very odd.
>

Hmmm ... what 1GbE and 10GbE NICs?  Which kernel?  We maintain kernel 
RPMs and tarballs for our customers, and if one of ours will work for 
you, you are welcome to it.

When we set up clusters and/or storage clusters, we typically 
(completely) isolate the (management and storage fabric) nets from each 
other.  We don't recommend putting interfaces on the same subnet unless 
there is a clear intention to channel bond.

You may be able to tell the box to ignore arps on the eth0 net, and then 
hand edit the arp table (arp -s ...) to force a connection.  However, 
this is somewhat convoluted and a management pain.

For out of band work, a kvm over IP could be helpful.  Does the box 
support kvm over ip from IPMI?  If not, you could get a drop in unit 
that does this (we use these for older less capable nodes when needed).



>>
>> bob
>>
>> On 10/21/2010 9:51 AM, Brock Palen wrote:
>>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>>>
>>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>>> We recently added a new oss, it has 1 1Gb interface and 1
>>>>> 10Gb interface,
>>>>>
>>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface
>>>>> is eth0 10.164.0.10
>>>> They look like they are on the same subnet if you are using /24
>>>> ...
>>> You are correct
>>>
>>> Both interfaces are on the same subnet:
>>>
>>> [root at oss4-gb ~]# route Kernel IP routing table Destination
>>> Gateway         Genmask         Flags Metric Ref    Use Iface
>>> 10.164.0.0      *               255.255.248.0   U     0      0
>>> 0 eth0 10.164.0.0      *               255.255.248.0   U     0
>>> 0        0 eth4 169.254.0.0     *               255.255.0.0     U
>>> 0      0        0 eth4 default         10.164.0.1      0.0.0.0
>>> UG    0      0        0 eth0
>>>
>>> There is no way to mask the lustre service away from the 1Gb
>>> interface?
>>>
>>>>> In modprobe.conf I have:
>>>>>
>>>>> options lnet networks=tcp0(eth4)
>>>>>
>>>>> lctl list_nids 10.164.0.166 at tcp
>>>>>
>>>>>> From a host I run:
>>>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>>>
>>>>> But yet I still see traffic over eth0 the 1Gb management
>>>>> network, might higher than I would expect (upto 100MB/s) The
>>>>> management interface is oss4-gb  So If I do from a client:
>>>>>
>>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>>>
>>>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa
>>>>> only the 10Gb interface, do I have so much traffic over the
>>>>> 1Gb interface? There is some traffic on the 10Gb interface,
>>>>> but I would like to tell lustre 'don't use the 1Gb
>>>>> interface'.
>>>> If they are on the same subnet, its possible that the 1GbE sees
>>>> the arp response first.  And then its pretty much guaranteed to
>>>> have the traffic go out that port.
>>>>
>>>> If your subnets are different, this shouldn't be the issue.
>>>>
>>>>> Thanks!
>>>>>
>>>>> Brock Palen www.umich.edu/~brockp Center for Advanced
>>>>> Computing brockp at umich.edu (734)936-1985
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>> -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics
>>>> Inc. email: landman at scalableinformatics.com web  :
>>>> http://scalableinformatics.com
>>>> http://scalableinformatics.com/jackrabbit phone: +1 734 786
>>>> 8423 x121 fax  : +1 866 888 3112 cell : +1 734 612 4615
>>>> _______________________________________________ Lustre-discuss
>>>> mailing list Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>> _______________________________________________ Lustre-discuss
>>> mailing list Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>> _______________________________________________ Lustre-discuss
>> mailing list Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
> _______________________________________________ Lustre-discuss
> mailing list Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the lustre-discuss mailing list