[Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

Vipul Pandya vipul at chelsio.com
Mon Feb 15 21:45:10 PST 2010


Hello Issac,

My ko2iblnd module supports map_on_demand option as shown below:
[root at nizam ~]# modinfo ko2iblnd
filename:
/lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1smp/kernel/net/lustre/ko2i
blnd.ko
license:        GPL
description:    Kernel OpenIB gen2 LND v2.00
author:         Sun Microsystems, Inc. <http://www.lustre.org/>
srcversion:     069AA2BBD411996C8DF36DD
depends:        libcfs,lnet,ib_core,rdma_cm
vermagic:       2.6.18-128.7.1.el5_lustre.1.8.1.1smp SMP mod_unload
gcc-4.1
parm:           service:service number (within RDMA_PS_TCP) (int)
parm:           cksum:set non-zero to enable message (not RDMA)
checksums (int)
parm:           timeout:timeout (seconds) (int)
parm:           ntx:# of message descriptors (int)
parm:           credits:# concurrent sends (int)
parm:           peer_credits:# concurrent sends to 1 peer (int)
parm:           peer_credits_hiw:when eagerly to return credits (int)
parm:           peer_buffer_credits:# per-peer router buffer credits
(int)
parm:           peer_timeout:Seconds without aliveness news to declare
peer dead (<=0 to disable) (int)
parm:           ipif_name:IPoIB interface name (charp)
parm:           retry_count:Retransmissions when no ACK received (int)
parm:           rnr_retry_count:RNR retransmissions (int)
parm:           keepalive:Idle time in seconds before sending a
keepalive (int)
parm:           ib_mtu:IB MTU 256/512/1024/2048/4096 (int)
parm:           concurrent_sends:send work-queue sizing (int)
parm:           map_on_demand:map on demand (int)
parm:           fmr_pool_size:size of the fmr pool (>= ntx / 4) (int)
parm:           fmr_flush_trigger:# dirty FMRs that triggers pool flush
(int)
parm:           fmr_cache:non-zero to enable FMR caching (int)
parm:           pmr_pool_size:size of the MR cache pmr pool (int)

-> I tried to load the ko2iblnd module as you have suggested. But still
I am unable to do 'lctl ping'. I am getting the same error as shown
below.
#> modprobe ko2iblnd map_on_demand=64
#> modprobe lnet
#> lctl ping 102.88.88.184 at o2ib
failed to ping 102.88.88.184 at o2ib: Input/output error
#> dmesg
Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0
Lustre: Register global MR array, MR size: 0xffffffff, array size: 2
fmr_pool: Device cxgb3_0 does not support FMRs
LustreError: 4122:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool()) Failed to
create FMR pool: -38
Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]
LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't create
QP: -12, send_wr: 520, recv_wr: 18
Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed()) Deleting
messages for 102.88.88.184 at o2ib: connection faile

I would be grateful if you can provide some more thoughts on this.
Please let me know if you require any further debugging information.

Thanks,
Vipul

-----Original Message-----
From: He.Huang at Sun.COM [mailto:He.Huang at Sun.COM] 
Sent: 16 February 2010 10:47
To: Vipul Pandya
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives
Input/Output error while executing lctl ping

On Fri, Feb 12, 2010 at 05:53:19AM -0800, Vipul Pandya wrote:
>    ......
>    #> lctl network up
>    LNET configured
>    Above command gave me following error in dmesg
>    #> dmesg
> 
>    Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0
>    Lustre: Register global MR array, MR size: 0xffffffff, array size:
2
>    fmr_pool: Device cxgb3_0 does not support FMRs
>    LustreError: 4134:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool())
Failed to
>    create FMR pool: -38

ib_create_fmr_pool() returned -ENOSYS, probably the HCA didn't support
FMR; this was not an fatal error.

>    Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]
> 
>    #> lctl ping 102.88.88.184 at o2ib
>    failed to ping 102.88.88.184 at o2ib: Input/output error
>    dmesg has shown following error:
>    #> dmesg
>    LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't
create
>    QP: -12, send_wr: 2056, recv_wr: 18

rdma_create_qp() returned -ENOMEM; most likely
init_qp_attr->cap.max_send_wr
was too big (2056) and needed too much memory.

>    Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed())
>    Deleting messages for 102.88.88.184 at o2ib: connection failed

You'd need to use the o2iblnd map-on-demand feature. To find out
whether your ko2iblnd module supports it:
modinfo ko2iblnd | grep map_on_demand

If yes, please try:
options ko2iblnd map_on_demand=64

Thanks,
Isaac



More information about the lustre-discuss mailing list