[lustre-discuss] o2ib (ib_qib) with 2.7.0 rpms on centos 6.6

Chris Hunter chris.hunter at yale.edu
Thu Nov 19 08:55:36 PST 2015


Thanks goes to sysadmins at Linkoping University (sweden) and llnl. It 
took some effort for the involved parties to push out a fix.
* http://lists.openfabrics.org/pipermail/users/2015-March/000453.html

chris hunter
chris.hunter at yale.edu

On 11/18/2015 04:14 PM, Chris Hunter wrote:
> Are you using truescale IB interfaces ?
>
> There is a known truescale bug in rhel/centos 6.6 kernels. You should
> try kernel 2.6.32-504.23.4 or newer. Some details of the bug are in
> LU-6698 and RHSA-2015-1081.
>
> regards,
> chris hunter
> yale hpc group
>
>> From: "Lassus, Magnus" <magnus.lassus at wartsila.com>
>> To: "lustre-discuss at lists.lustre.org"
>>     <lustre-discuss at lists.lustre.org>
>> Subject: [lustre-discuss] o2ib (ib_qib) with 2.7.0 rpms on centos 6.6:
>>     LNetError: kiblnd_init_rdma: Src buffer exhausted: 1 frags
>> Message-ID:
>>     <HE1PR04MB1273C36E676E1824D8B2E3A4941C0 at HE1PR04MB1273.eurprd04.prod.outlook.com>
>>
>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Hi,
>>
>> I fail to understand where I go wrong in getting o2ib working using
>> 2.7.0 rpms on top of CentOS 6.6. Running selftest I see:
>>
>> Nov 17 18:22:40 ss08 kernel: LNet: Added LNI 10.165.32.18 at o2ib
>> [8/256/0/180]
>> Nov 17 18:24:40 ss08 kernel: LNetError:
>> 12532:0:(o2iblnd_cb.c:1123:kiblnd_init_rdma()) Src buffer exhausted: 1
>> frags
>> Nov 17 18:24:40 ss08 kernel: LustreError:
>> 12553:0:(brw_test.c:212:brw_check_page()) Bad data in page
>> ffffea0070c20800: 0xbeefbeefbeefbeef, 0xeeb0eeb1eeb2eeb3 expec
>> Nov 17 18:24:40 ss08 kernel: LustreError:
>> 12553:0:(brw_test.c:238:brw_check_bulk()) Bulk page ffffea0070c20800
>> (0/256) is corrupted!
>> Nov 17 18:24:40 ss08 kernel: LustreError:
>> 12553:0:(brw_test.c:343:brw_client_done_rpc()) Bulk data from
>> 12345-10.165.32.18 at o2ib is corrupted!
>> Nov 17 18:24:40 ss08 kernel: LNetError:
>> 12532:0:(o2iblnd_cb.c:1690:kiblnd_reply()) Can't setup rdma for GET
>> from 10.165.32.18 at o2ib: -71
>> Nov 17 18:25:31 ss08 kernel: LNetError:
>> 12529:0:(o2iblnd_cb.c:3036:kiblnd_check_txs_locked()) Timed out tx:
>> active_txs, 0 seconds
>> Nov 17 18:25:31 ss08 kernel: LNetError:
>> 12529:0:(o2iblnd_cb.c:3099:kiblnd_check_conns()) Timed out RDMA with
>> 10.165.32.18 at o2ib (0): c: 7, oc: 0, rc: 7
>> Nov 17 18:25:31 ss08 kernel: LustreError:
>> 12558:0:(brw_test.c:388:brw_bulk_ready()) BRW bulk WRITE failed for
>> RPC from 12345-10.165.32.18 at o2ib: -103
>> Nov 17 18:25:31 ss08 kernel: LustreError:
>> 12558:0:(brw_test.c:362:brw_server_rpc_done()) Bulk transfer from
>> 12345-10.165.32.18 at o2ib has failed: -5
>> Nov 17 18:25:48 ss08 kernel: LNet:
>> 12581:0:(rpc.c:1077:srpc_client_rpc_expired()) Client RPC expired:
>> service 11, peer 12345-10.165.32.18 at o2ib, timeout 64.
>> Nov 17 18:25:48 ss08 kernel: LustreError:
>> 12555:0:(brw_test.c:318:brw_client_done_rpc()) BRW RPC to
>> 12345-10.165.32.18 at o2ib failed with -110
>>
>> # rpm -qa | egrep 'lustre|kernel' | sort
>> dracut-kernel-004-356.el6.noarch
>> kernel-2.6.32-504.8.1.el6_lustre.x86_64
>> kernel-devel-2.6.32-504.8.1.el6_lustre.x86_64
>> kernel-firmware-2.6.32-504.8.1.el6_lustre.x86_64
>> kernel-headers-2.6.32-504.8.1.el6_lustre.x86_64
>> lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> lustre-iokit-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> lustre-modules-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> lustre-osd-ldiskfs-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> lustre-osd-ldiskfs-mount-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> lustre-tests-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
>> perf-2.6.32-504.8.1.el6_lustre.x86_64
>> python-perf-2.6.32-504.8.1.el6_lustre.x86_64
>>
>> Using latest 2.7.63 build on 6.7 works.
>>
>> Any pointers are warmly welcome as I'd prefer to use 2.7.0.
>>
>> Regards,
>> Magnus
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_pipermail_lustre-2Ddiscuss-2Dlustre.org_attachments_20151118_bc19b61a_attachment.html&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=yntd6s6FbhcK6yz7f--sTQB8uauio2sPpZXJO07_GMM&s=fmaW2S-MSdcgBPqEnTVELb9GaBrR0zwaQlFI9_QrbYw&e=
>> >
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=yntd6s6FbhcK6yz7f--sTQB8uauio2sPpZXJO07_GMM&s=XPhf61e64WjkcxWw05wudsYWLfRBfsN0OiJF8O2DYE4&e=
>>
>>
>>
>> ------------------------------
>>
>> End of lustre-discuss Digest, Vol 116, Issue 9
>> **********************************************
>>


More information about the lustre-discuss mailing list