[Lustre-discuss] [HPDD-discuss] lustre lnet infiniband config

aayush agrawal aayush.agrawal at calsoftinc.com
Wed Oct 1 00:50:00 PDT 2014


Hi Mike,

While installing OFED I have used below command:
# ./mlnxofedinstall  -vvv --add-kernel-support --without-32bit 
--without-fw-update --hpc

I have used option --add-kernel-support, Which add kernel support (Run 
mlnx_add_kernel_support.sh). This is what you meant to say, right?

Thanks,
Aayush.

On 9/30/2014 11:04 PM, Mike Ware wrote:
> I knew I had it somewhere
>
> http://lists.lustre.org/pipermail/lustre-discuss/2012-November/016988.html 
>
>
> Mike
>
> On Tue, Sep 30, 2014 at 10:32 AM, Mike Ware <charnobyl3000 at gmail.com 
> <mailto:charnobyl3000 at gmail.com>> wrote:
>
>     I had a similar issue using the Mellanox packages. If i remember
>     correctly I had to recompile the drivers against the Lustre kernel
>     for  the install. I believe Mellanox had an article on this but I
>     don't have the link.
>
>     Mike
>
>     On Tue, Sep 30, 2014 at 8:07 AM, Parinay Kondekar
>     <parinay.kondekar at seagate.com
>     <mailto:parinay.kondekar at seagate.com>> wrote:
>
>         IMO you should try out strace to see if anything is noticed.
>         "Write failed: Broken pipe" is quite common message and
>         difficult to conclude anything with.
>
>         Regards
>         parinay
>
>         On Tue, Sep 30, 2014 at 8:16 PM, aayush agrawal
>         <aayush.agrawal at calsoftinc.com
>         <mailto:aayush.agrawal at calsoftinc.com>> wrote:
>
>             Hi Parinay,
>
>             Yes, I see ib0 in output of ifconfig -a.
>             I also tried with options lnet networks=*o2ib_0_*(ib0) but
>             no luck.
>             While loading lnet I do see error in var/log/messages:
>
>             kernel: LNet: HW CPU cores: 32, npartitions: 4
>             alg: No test for crc32 (crc32-table)
>             kernel: alg: No test for adler32 (adler32-zlib)
>             kernel: alg: No test for crc32 (crc32-pclmul)
>             kernel: padlock: VIA PadLock Hash Engine not detected.
>             modprobe: FATAL: Error inserting padlock_sha
>             (/lib/modules/2.6.32_358/kernel/drivers/crypto/padlock-sha.ko):
>             No such device
>
>             But as per below link this should not be a problem?
>             https://jira.hpdd.intel.com/browse/LU-1599
>
>             modprobe lnet completes successfully and I see "Write
>             failed: Broken pipe" after running "lctl network up" and
>             after this session gets logout from the server.
>
>             Thanks,
>             Aayush.
>
>
>             On 9/30/2014 7:21 PM, Parinay Kondekar wrote:
>>             - what is the output of 'ifconfig -a' , do you see ib0 
>>             there ? mentioning 'options lnet
>>             networks=*o2ib_0_*(ib0)'**should be enough.
>>             - anything in syslog ?
>>
>>             HTH
>>
>>             On Tue, Sep 30, 2014 at 6:03 PM, aayush agrawal
>>             <aayush.agrawal at calsoftinc.com
>>             <mailto:aayush.agrawal at calsoftinc.com>> wrote:
>>
>>                 Hi,
>>
>>                 I am trying to build lustre 2.5.0 against
>>                 MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4
>>                 with kernel version 2.6.32-358.
>>                 But I am not able to set lnet config settings
>>                 properly. I used settings suggested in lustre 2.x
>>                 manual. But then not able to get network up using lctl.
>>
>>                 Details:
>>
>>                 I have two server machines, one for mgs+mdt and
>>                 second for oss and one client machine. I want to
>>                 setup Infiniband on all these machines.
>>                 I could run below steps successfully for all the
>>                 three machines:
>>                 1. Run script mlnxofedinstall
>>                 # ./mlnxofedinstall -vvv --add-kernel-support
>>                 --without-32bit --without-fw-update --hpc
>>                 2. Restart openibd service
>>                 # /etc/init.d/openibd restart
>>                 3. configure ib0 interface.
>>                 4. configure lustre with o2ib
>>                 # ./configure
>>                 --with-linux=Path_to_linux-2.6.32-358.18.1.el6
>>                 --with-o2ib=/usr/src/ofa_kernel/default/
>>
>>                 5. make lustre rpms:
>>                     # make rpms
>>                 This gave me below compilation error
>>                 I looked online for this error and found bug
>>                 registered on the same:
>>                 https://jira.hpdd.intel.com/browse/LU-4266
>>                 <https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.hpdd.intel.com_browse_LU-2D4266&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=Gu0enSN8vm3fdyqEtx0cJjPMhWf9o_TCXmJhHez9HKE&e=>
>>                 Below patch from above link solved the problem and
>>                 hence I could build lustre rpms:
>>                 http://review.whamcloud.com/#/c/8451/1
>>                 <https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_-23_c_8451_1&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=BqWJdkdWSRVMHWQkLWAhYaV0yfRwJZDUb61TfAgRss0&e=>
>>
>>                 Now first I want to do the Infiniband setup for mgs
>>                 and mdt on single machine which also has Ethernet IP.
>>                 Then I want to format and mount mgs and mdt.
>>                 So I installed above created lustre rpms and then
>>                 added below line in /etc/modprobe.d/lustre.conf
>>                 options lnet networks=o2ib(ib0)
>>
>>                 Then I rebooted the machine to remove all lustre
>>                 related modules including lnet and then ranmodprobe
>>                 lnet command to add above parameters and the ran lctl
>>                 network up which is giving me below error:
>>                 LNET configure error 100: Network is down
>>
>>                 I looked online and found below discussion on same error:
>>                 http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html
>>                 <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_pipermail_lustre-2Ddiscuss_2010-2DJune_013510.html&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=aCgXfqCUyJ7IXVRJHjqpk2HCS1_dsKDuaKJrDPmWp4I&e=>
>>
>>                 As per suggestion in above mail I tried with below
>>                 line in /etc/modprobe.d/lustre.conf. In below command
>>                 for IB_IP, I have given infiniband IP.
>>                 options lnet *networks=o2ib(ib0)* routes="tcp0
>>                 IB_IP at o2ib"
>>                 This command hangs for around 2 to 3 minutes and then
>>                 gives error: Write failed: Broken pipe. Same is the
>>                 case for "options lnet *networks=o2ib(ib0)*"
>>                 But if I set: options lnet
>>                 *networks=tcp0(eth0),o2ib(ib0)* routes="tcp1
>>                 IB_IP at o2ib" then it gives LNET configure error 100:
>>                 Network is down.
>>
>>                 It seems that for network=o2ib(ibo) I am getting
>>                 error Write failed: Broken pipe.
>>                 Am I missing anything while following above steps? Or
>>                 how do I resolve above error?
>>
>>                 Thanks,
>>                 Aayush.
>>
>>                 <html>
>>                 _______________________________________________
>>                 HPDD-discuss mailing list
>>                 HPDD-discuss at lists.01.org
>>                 <mailto:HPDD-discuss at lists.01.org>
>>                 https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman_listinfo_hpdd-2Ddiscuss&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=0hW3r7x0NhgbZ7zgaZKr9K_fk7_E8bs0f-GAlH89rgM&e=
>>
>>
>
>
>
>         _______________________________________________
>         Lustre-discuss mailing list
>         Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>
>         http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20141001/73eea254/attachment.htm>


More information about the lustre-discuss mailing list