[Lustre-discuss] [HPDD-discuss] lustre lnet infiniband config

Mike Ware charnobyl3000 at gmail.com
Wed Oct 1 11:39:33 PDT 2014


Hi Aayush,

i used the same command but i had to first generate a new iso of the ofed
install before it worked ( i cant remember why this was the case). if you
haven't already you can view the details here
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-3_0_0.txt
. I ran into this issue about 1.5 years ago so i hope i'm not forgetting
anything.
Mike

On Wed, Oct 1, 2014 at 12:50 AM, aayush agrawal <
aayush.agrawal at calsoftinc.com> wrote:

>  Hi Mike,
>
> While installing OFED I have used below command:
> # ./mlnxofedinstall  -vvv --add-kernel-support --without-32bit
> --without-fw-update --hpc
>
> I have used option --add-kernel-support, Which add kernel support (Run
> mlnx_add_kernel_support.sh). This is what you meant to say, right?
>
> Thanks,
> Aayush.
>
>
> On 9/30/2014 11:04 PM, Mike Ware wrote:
>
>  I knew I had it somewhere
>
>
> http://lists.lustre.org/pipermail/lustre-discuss/2012-November/016988.html
>
>
>  Mike
>
> On Tue, Sep 30, 2014 at 10:32 AM, Mike Ware <charnobyl3000 at gmail.com>
> wrote:
>
>> I had a similar issue using the Mellanox packages. If i remember
>> correctly I had to recompile the drivers against the Lustre kernel for  the
>> install. I believe Mellanox had an article on this but I don't have the
>> link.
>>
>>  Mike
>>
>>  On Tue, Sep 30, 2014 at 8:07 AM, Parinay Kondekar <
>> parinay.kondekar at seagate.com> wrote:
>>
>>>   IMO you should try out strace to see if anything is noticed.
>>> "Write failed: Broken pipe" is quite common message and difficult to
>>> conclude anything with.
>>>
>>>  Regards
>>>  parinay
>>>
>>> On Tue, Sep 30, 2014 at 8:16 PM, aayush agrawal <
>>> aayush.agrawal at calsoftinc.com> wrote:
>>>
>>>>  Hi Parinay,
>>>>
>>>> Yes, I see ib0 in output of ifconfig -a.
>>>> I also tried with options lnet networks=*o2ib0*(ib0) but no luck.
>>>> While loading lnet I do see error in var/log/messages:
>>>>
>>>> kernel: LNet: HW CPU cores: 32, npartitions: 4
>>>> alg: No test for crc32 (crc32-table)
>>>> kernel: alg: No test for adler32 (adler32-zlib)
>>>> kernel: alg: No test for crc32 (crc32-pclmul)
>>>> kernel: padlock: VIA PadLock Hash Engine not detected.
>>>> modprobe: FATAL: Error inserting padlock_sha
>>>> (/lib/modules/2.6.32_358/kernel/drivers/crypto/padlock-sha.ko): No such
>>>> device
>>>>
>>>> But as per below link this should not be a problem?
>>>> https://jira.hpdd.intel.com/browse/LU-1599
>>>>
>>>> modprobe lnet completes successfully and I see "Write failed: Broken
>>>> pipe" after running "lctl network up" and after this session gets logout
>>>> from the server.
>>>>
>>>> Thanks,
>>>> Aayush.
>>>>
>>>>
>>>> On 9/30/2014 7:21 PM, Parinay Kondekar wrote:
>>>>
>>>>  - what is the output of 'ifconfig -a' , do you see ib0  there ?
>>>> mentioning 'options lnet networks=*o2ib0*(ib0)' should be enough.
>>>>  - anything in syslog ?
>>>>
>>>>  HTH
>>>>
>>>> On Tue, Sep 30, 2014 at 6:03 PM, aayush agrawal <
>>>> aayush.agrawal at calsoftinc.com> wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>> I am trying to build lustre 2.5.0 against
>>>>> MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4 with kernel version
>>>>> 2.6.32-358.
>>>>> But I am not able to set lnet config settings properly. I used
>>>>> settings suggested in lustre 2.x manual. But then not able to get network
>>>>> up using lctl.
>>>>>
>>>>> Details:
>>>>>
>>>>> I have two server machines, one for mgs+mdt and second for oss and one
>>>>> client machine. I want to setup Infiniband on all these machines.
>>>>> I could run below steps successfully for all the three machines:
>>>>> 1. Run script mlnxofedinstall
>>>>>     # ./mlnxofedinstall  -vvv --add-kernel-support --without-32bit
>>>>> --without-fw-update --hpc
>>>>> 2. Restart openibd service
>>>>>     #  /etc/init.d/openibd restart
>>>>> 3. configure ib0 interface.
>>>>> 4. configure lustre with o2ib
>>>>>     # ./configure --with-linux=Path_to_linux-2.6.32-358.18.1.el6
>>>>> --with-o2ib=/usr/src/ofa_kernel/default/
>>>>>
>>>>> 5. make lustre rpms:
>>>>>     # make rpms
>>>>> This gave me below compilation error
>>>>> I looked online for this error and found bug registered on the same:
>>>>> https://jira.hpdd.intel.com/browse/LU-4266
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.hpdd.intel.com_browse_LU-2D4266&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=Gu0enSN8vm3fdyqEtx0cJjPMhWf9o_TCXmJhHez9HKE&e=>
>>>>> Below patch from above link solved the problem and hence I could build
>>>>> lustre rpms:
>>>>> http://review.whamcloud.com/#/c/8451/1
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_-23_c_8451_1&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=BqWJdkdWSRVMHWQkLWAhYaV0yfRwJZDUb61TfAgRss0&e=>
>>>>>
>>>>> Now first I want to do the Infiniband setup for mgs and mdt on single
>>>>> machine which also has Ethernet IP. Then I want to format and mount mgs and
>>>>> mdt.
>>>>> So I installed above created lustre rpms and then added below line in
>>>>> /etc/modprobe.d/lustre.conf
>>>>> options lnet networks=o2ib(ib0)
>>>>>
>>>>> Then I rebooted the machine to remove all lustre related modules
>>>>> including lnet and then ran modprobe lnet command to add above
>>>>> parameters and the ran lctl network up which is giving me below error:
>>>>> LNET configure error 100: Network is down
>>>>>
>>>>> I looked online and found below discussion on same error:
>>>>> http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_pipermail_lustre-2Ddiscuss_2010-2DJune_013510.html&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=aCgXfqCUyJ7IXVRJHjqpk2HCS1_dsKDuaKJrDPmWp4I&e=>
>>>>>
>>>>> As per suggestion in above mail I tried with below line in
>>>>> /etc/modprobe.d/lustre.conf.  In below command for IB_IP, I have
>>>>> given infiniband IP.
>>>>> options lnet *networks=o2ib(ib0)* routes="tcp0 IB_IP at o2ib"
>>>>> This command hangs for around 2 to 3 minutes and then gives error:
>>>>> Write failed: Broken pipe. Same is the case for "options lnet
>>>>> *networks=o2ib(ib0)*"
>>>>> But if I set: options lnet *networks=tcp0(eth0),o2ib(ib0)*
>>>>> routes="tcp1 IB_IP at o2ib" then it gives LNET configure error 100:
>>>>> Network is down.
>>>>>
>>>>> It seems that for network=o2ib(ibo) I am getting error Write failed:
>>>>> Broken pipe.
>>>>> Am I missing anything while following above steps? Or how do I resolve
>>>>> above error?
>>>>>
>>>>> Thanks,
>>>>> Aayush.
>>>>>
>>>>> <html>
>>>>> _______________________________________________
>>>>> HPDD-discuss mailing list
>>>>> HPDD-discuss at lists.01.org
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman_listinfo_hpdd-2Ddiscuss&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=0hW3r7x0NhgbZ7zgaZKr9K_fk7_E8bs0f-GAlH89rgM&e=
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>  _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20141001/41a28770/attachment.htm>


More information about the lustre-discuss mailing list