[Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping
Vipul Pandya
vipul at chelsio.com
Fri Feb 12 05:53:19 PST 2010
Hi All,
I am trying to run Lustre over iWARP. For this I have compiled
Lustre-1.8.1.1 with linux-2.6.18-128.7.1 source and OFED-1.5 source.
I have installed all the required rpms for lustre.
After this I booted into the lustre patched kernel and gave the
following option in /etc/modprobe.conf for lnet to work with o2ib
#> cat /etc/modprobe.conf
options lnet networks="o2ib0(eth2)"
I loaded our RDMA adapter modules and the lnet and ko2iblnd modules as
follows:
#> modprobe cxgb3
#> modprobe iw_cxgb3
#> modprobe rdma_ucm
#> modprobe lnet
#> modprobe ko2iblnd
I was able to load all the modules successfully.
Then I assigned the ip address to eth2 interface and brought it up
#> ifconfig eth2 102.88.88.188/24 up
#> ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:48:C7:8F:8E
inet addr:10.193.184.188 Bcast:10.193.187.255
Mask:255.255.252.0
inet6 addr: fe80::230:48ff:fec7:8f8e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13224 errors:0 dropped:0 overruns:0 frame:0
TX packets:797 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1523344 (1.4 MiB) TX bytes:203205 (198.4 KiB)
Memory:dea20000-dea40000
eth2 Link encap:Ethernet HWaddr 00:07:43:05:07:35
inet addr:102.88.88.188 Bcast:102.88.88.255
Mask:255.255.255.0
inet6 addr: fe80::207:43ff:fe05:735/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:153 errors:0 dropped:0 overruns:0 frame:0
TX packets:47 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:22537 (22.0 KiB) TX bytes:8500 (8.3 KiB)
Interrupt:185 Memory:de801000-de801fff
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1607 errors:0 dropped:0 overruns:0 frame:0
TX packets:1607 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3196948 (3.0 MiB) TX bytes:3196948 (3.0 MiB)
After this I tried to bring the lnet network up as follows:
#> lctl network up
LNET configured
Above command gave me following error in dmesg
#> dmesg
Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0
Lustre: Register global MR array, MR size: 0xffffffff, array size: 2
fmr_pool: Device cxgb3_0 does not support FMRs
LustreError: 4134:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool()) Failed to
create FMR pool: -38
Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]
I repeat the same procedure on the other node of lustre and found the
same result.
Then I tried to do lctl ping between two nodes of lustre, which gave me
following error:
#> lctl ping 102.88.88.184 at o2ib
failed to ping 102.88.88.184 at o2ib: Input/output error
dmesg has shown following error:
#> dmesg
LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't create
QP: -12, send_wr: 2056, recv_wr: 18
Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed()) Deleting
messages for 102.88.88.184 at o2ib: connection failed
I found one thread where it has given the patch to support FMR in o2ib.
But I don't think this patch is applicable for lustre-1.8.1.1.
http://lists.lustre.org/pipermail/lustre-discuss/2008-February/006502.ht
ml
Can anyone please guide me on this.
Thank you very much in advance.
Vipul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100212/8273072a/attachment.htm>
More information about the lustre-discuss
mailing list