[Lustre-discuss] Multirail IB Configuration Issue

Isaac Huang he.huang at intel.com
Tue Feb 26 19:20:47 PST 2013


On Tue, Feb 26, 2013 at 01:04:06PM -0500, mages, brian wrote:
> Hi,
> 
> It appears that I've resolved the issue and therefore wanted to provide an update to this list.  As I noted in the description of my configuration, the client only has a single IB interface.  After changing the options for lnet in "/etc/modprobe.conf" (on the client) from "options lnet networks=o2ib0(ib0)" to "options lnet networks=o2ib0(ib0),o2ib1(ib0)", things started working.

Why do you want two o2ib networks over a same interface?

> ......
> Feb 26 11:26:32 bmr2-s14 kernel: LNetError: 7580:0:(o2iblnd_cb.c:2989:kiblnd_check_txs_locked()) Timed out tx: active_txs, 3 seconds
> Feb 26 11:26:32 bmr2-s14 kernel: LNetError: 7580:0:(o2iblnd_cb.c:3052:kiblnd_check_conns()) Timed out RDMA with 192.168.1.20 at o2ib (55): c: 8, oc: 0, rc: 16

This often indicates problem with the underlying network, i.e. the HCA
couldn't complete an outgoing message in time - either something wrong
on the network or with 192.168.1.20 at o2ib. Did you see any error on
192.168.1.20 at o2ib too?

- Isaac



More information about the lustre-discuss mailing list