[Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping
Vipul Pandya
vipul at chelsio.com
Mon Feb 15 21:45:10 PST 2010
Hello Issac,
My ko2iblnd module supports map_on_demand option as shown below:
[root at nizam ~]# modinfo ko2iblnd
filename:
/lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1smp/kernel/net/lustre/ko2i
blnd.ko
license: GPL
description: Kernel OpenIB gen2 LND v2.00
author: Sun Microsystems, Inc. <http://www.lustre.org/>
srcversion: 069AA2BBD411996C8DF36DD
depends: libcfs,lnet,ib_core,rdma_cm
vermagic: 2.6.18-128.7.1.el5_lustre.1.8.1.1smp SMP mod_unload
gcc-4.1
parm: service:service number (within RDMA_PS_TCP) (int)
parm: cksum:set non-zero to enable message (not RDMA)
checksums (int)
parm: timeout:timeout (seconds) (int)
parm: ntx:# of message descriptors (int)
parm: credits:# concurrent sends (int)
parm: peer_credits:# concurrent sends to 1 peer (int)
parm: peer_credits_hiw:when eagerly to return credits (int)
parm: peer_buffer_credits:# per-peer router buffer credits
(int)
parm: peer_timeout:Seconds without aliveness news to declare
peer dead (<=0 to disable) (int)
parm: ipif_name:IPoIB interface name (charp)
parm: retry_count:Retransmissions when no ACK received (int)
parm: rnr_retry_count:RNR retransmissions (int)
parm: keepalive:Idle time in seconds before sending a
keepalive (int)
parm: ib_mtu:IB MTU 256/512/1024/2048/4096 (int)
parm: concurrent_sends:send work-queue sizing (int)
parm: map_on_demand:map on demand (int)
parm: fmr_pool_size:size of the fmr pool (>= ntx / 4) (int)
parm: fmr_flush_trigger:# dirty FMRs that triggers pool flush
(int)
parm: fmr_cache:non-zero to enable FMR caching (int)
parm: pmr_pool_size:size of the MR cache pmr pool (int)
-> I tried to load the ko2iblnd module as you have suggested. But still
I am unable to do 'lctl ping'. I am getting the same error as shown
below.
#> modprobe ko2iblnd map_on_demand=64
#> modprobe lnet
#> lctl ping 102.88.88.184 at o2ib
failed to ping 102.88.88.184 at o2ib: Input/output error
#> dmesg
Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0
Lustre: Register global MR array, MR size: 0xffffffff, array size: 2
fmr_pool: Device cxgb3_0 does not support FMRs
LustreError: 4122:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool()) Failed to
create FMR pool: -38
Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]
LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't create
QP: -12, send_wr: 520, recv_wr: 18
Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed()) Deleting
messages for 102.88.88.184 at o2ib: connection faile
I would be grateful if you can provide some more thoughts on this.
Please let me know if you require any further debugging information.
Thanks,
Vipul
-----Original Message-----
From: He.Huang at Sun.COM [mailto:He.Huang at Sun.COM]
Sent: 16 February 2010 10:47
To: Vipul Pandya
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives
Input/Output error while executing lctl ping
On Fri, Feb 12, 2010 at 05:53:19AM -0800, Vipul Pandya wrote:
> ......
> #> lctl network up
> LNET configured
> Above command gave me following error in dmesg
> #> dmesg
>
> Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0
> Lustre: Register global MR array, MR size: 0xffffffff, array size:
2
> fmr_pool: Device cxgb3_0 does not support FMRs
> LustreError: 4134:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool())
Failed to
> create FMR pool: -38
ib_create_fmr_pool() returned -ENOSYS, probably the HCA didn't support
FMR; this was not an fatal error.
> Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]
>
> #> lctl ping 102.88.88.184 at o2ib
> failed to ping 102.88.88.184 at o2ib: Input/output error
> dmesg has shown following error:
> #> dmesg
> LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't
create
> QP: -12, send_wr: 2056, recv_wr: 18
rdma_create_qp() returned -ENOMEM; most likely
init_qp_attr->cap.max_send_wr
was too big (2056) and needed too much memory.
> Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed())
> Deleting messages for 102.88.88.184 at o2ib: connection failed
You'd need to use the o2iblnd map-on-demand feature. To find out
whether your ko2iblnd module supports it:
modinfo ko2iblnd | grep map_on_demand
If yes, please try:
options ko2iblnd map_on_demand=64
Thanks,
Isaac
More information about the lustre-discuss
mailing list