[Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

Vipul Pandya vipul at chelsio.com
Sun Feb 14 23:01:18 PST 2010


Hi Rishi,

 

First of all, thanks for your response.

Yes, eth2 is the device associated with IP over iWARP.

 

Thanks,

Vipul

 

From: rishi pathak [mailto:mailmaverick666 at gmail.com] 
Sent: 15 February 2010 12:23
To: Vipul Pandya
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives
Input/Output error while executing lctl ping

 

Hello Vipul,
                

On Fri, Feb 12, 2010 at 7:23 PM, Vipul Pandya <vipul at chelsio.com> wrote:

Hi All,

 

I am trying to run Lustre over iWARP. For this I have compiled
Lustre-1.8.1.1 with linux-2.6.18-128.7.1 source and OFED-1.5 source.

I have installed all the required rpms for lustre.

 

After this I booted into  the lustre patched kernel and gave the
following option in /etc/modprobe.conf for lnet to work with o2ib

#> cat /etc/modprobe.conf

options lnet networks="o2ib0(eth2)"

I am not familiar with Lustre over iWARP interconnect but still is eth2
the device associated with IP over iWARP .

	 

	I loaded our RDMA adapter modules and the lnet and ko2iblnd
modules as follows:

	#> modprobe cxgb3

	#> modprobe iw_cxgb3

	#> modprobe rdma_ucm

	#> modprobe lnet

	#> modprobe ko2iblnd

	 

	I was able to load all the modules successfully.

	 

	Then I assigned the ip address to eth2 interface and brought it
up

	#> ifconfig eth2 102.88.88.188/24 up

	#> ifconfig

	eth0      Link encap:Ethernet  HWaddr 00:30:48:C7:8F:8E

	          inet addr:10.193.184.188  Bcast:10.193.187.255
Mask:255.255.252.0

	          inet6 addr: fe80::230:48ff:fec7:8f8e/64 Scope:Link

	          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

	          RX packets:13224 errors:0 dropped:0 overruns:0 frame:0

	          TX packets:797 errors:0 dropped:0 overruns:0 carrier:0

	          collisions:0 txqueuelen:1000

	          RX bytes:1523344 (1.4 MiB)  TX bytes:203205 (198.4
KiB)

	          Memory:dea20000-dea40000

	 

	eth2      Link encap:Ethernet  HWaddr 00:07:43:05:07:35

	          inet addr:102.88.88.188  Bcast:102.88.88.255
Mask:255.255.255.0

	          inet6 addr: fe80::207:43ff:fe05:735/64 Scope:Link

	          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

	          RX packets:153 errors:0 dropped:0 overruns:0 frame:0

	          TX packets:47 errors:0 dropped:0 overruns:0 carrier:0

	          collisions:0 txqueuelen:1000

	          RX bytes:22537 (22.0 KiB)  TX bytes:8500 (8.3 KiB)

	          Interrupt:185 Memory:de801000-de801fff

	 

	lo        Link encap:Local Loopback

	          inet addr:127.0.0.1  Mask:255.0.0.0

	          inet6 addr: ::1/128 Scope:Host

	          UP LOOPBACK RUNNING  MTU:16436  Metric:1

	          RX packets:1607 errors:0 dropped:0 overruns:0 frame:0

	          TX packets:1607 errors:0 dropped:0 overruns:0
carrier:0

	          collisions:0 txqueuelen:0

	          RX bytes:3196948 (3.0 MiB)  TX bytes:3196948 (3.0 MiB)

	 

	After this I tried to bring the lnet network up as follows:

	#> lctl network up

	LNET configured

	 

	Above command gave me following error in dmesg

	#> dmesg

	Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0

	Lustre: Register global MR array, MR size: 0xffffffff, array
size: 2

	fmr_pool: Device cxgb3_0 does not support FMRs

	LustreError: 4134:0:(o2iblnd.c:1393:kiblnd_create_fmr_pool())
Failed to create FMR pool: -38

	Lustre: Added LNI 102.88.88.188 at o2ib [8/64/0/0]

	 

	I repeat the same procedure on the other node of lustre and
found the same result.

	Then I tried to do lctl ping between two nodes of lustre, which
gave me following error:

	 

	#> lctl ping 102.88.88.184 at o2ib

	failed to ping 102.88.88.184 at o2ib: Input/output error

	 

	dmesg has shown following error:

	#> dmesg

	LustreError: 2453:0:(o2iblnd.c:801:kiblnd_create_conn()) Can't
create QP: -12, send_wr: 2056, recv_wr: 18

	Lustre: 2453:0:(o2iblnd_cb.c:1953:kiblnd_peer_connect_failed())
Deleting messages for 102.88.88.184 at o2ib: connection failed

	 

	I found one thread where it has given the patch to support FMR
in o2ib. But I don't think this patch is applicable for lustre-1.8.1.1.

	
http://lists.lustre.org/pipermail/lustre-discuss/2008-February/006502.ht
ml

	 

	Can anyone please guide me on this.

	 

	Thank you very much in advance.

	Vipul

	 

	
	_______________________________________________
	Lustre-discuss mailing list
	Lustre-discuss at lists.lustre.org
	http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100214/20973f1e/attachment.htm>


More information about the lustre-discuss mailing list