[Lustre-discuss] Configuring Lustre routring between two tcp networks

Erik Froese erik.froese at gmail.com
Thu Jun 11 19:51:01 PDT 2009


OK here's where I am now.
The public client can ping the routers public address but not the private
address.

[root at routed-client lnet]$ cat /etc/modprobe.conf
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter megaraid_mbox
alias scsi_hostadapter1 ata_piix
# eth0 is part of tcp1 (NYU-NET)
# In order to get to tcp (Cluster private), use the network on
# 128.122.X.Y at tcp1
options lnet accept=all
options lnet networks=tcp1(eth0) routes="tcp 128.122.X.Y at tcp1"

[root at routed-client lnet]$ lctl network up
LNET configured

[root at routed-client lnet]$ cat /proc/sys/lnet/routes
Routing disabled
net      hops   state router
tcp         1      up 128.122.109.28 at tcp1

[root at routed-client lnet]$ cat /proc/sys/lnet/routers
ref  rtr_ref alive_cnt  state    last_ping router
3          1         0     up            0 128.122.109.28 at tcp1

[root at routed-client lnet]$ lctl ping 128.122.109.28 at tcp1
12345-0 at lo
12345-10.1.255.247 at tcp
12345-128.122.109.28 at tcp1

[root at routed-client lnet]$ lctl ping 10.1.255.252 at tcp
failed to ping 10.1.255.252 at tcp: Input/output error

I can see traffic between the routed-client and the router as well as
between the router and the MGS/MDS (10.1.255.252 at tcp)

The mgs has the following config.

[root at mgs-0-0 lnet]# cat /etc/modprobe.conf
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptsas
alias scsi_hostadapter2 usb-storage
alias eth0 e1000
alias eth1 e1000
alias eth2 e1000
alias eth3 e1000
options lnet forwarding="enabled"
options lnet accept=all
options lnet networks=tcp(eth0) routes="tcp1 10.1.255.247 at tcp"

[root at mgs-0-0 lnet]# lctl network up
LNET configured

But it doesn't see any routes or routers.

[root at mgs-0-0 lnet]# cat /proc/sys/lnet/routes
Routing disabled
net      hops   state router

[root at mgs-0-0 lnet]# cat /proc/sys/lnet/routers
ref  rtr_ref alive_cnt  state    last_ping router

And this is what /var/log/messages and dmesg contain with or without
enabling neterror logging
Jun 11 22:41:07 mgs-0-0 kernel: LustreError:
10869:0:(lib-move.c:1250:lnet_send()) No route to 12345-128.122.X.Y at tcp1
Jun 11 22:41:07 mgs-0-0 kernel: LustreError:
10869:0:(lib-move.c:1723:lnet_parse_get()) 10.1.255.252 at tcp: Unable to send
REPLY for GET from 12345-128.122.X.Y at tcp1: -113




On Fri, Jun 5, 2009 at 12:48 PM, Isaac Huang <He.Huang at sun.com> wrote:

> On Thu, Jun 04, 2009 at 01:59:48PM -0400, Erik Froese wrote:
> >    Thanks Andreas and Natalie,
> >
> >    I've made the changes you suggested (setting tcp1 as the external
> >    network) and I'm able to lctl ping the 128.122.x.y address but I still
> >    cannot ping the private address for the MDS.
>
> Please show us the commands you've run and their outputs, together
> with error messages in dmesg. It'd help to "echo +neterror >
> /proc/sys/lnet/printk" before running the commands.
>
> >    Could the problem be that the lustre fs on the private network is
> >    actually called tcp and not tcp0? Are those synonymous?
>
> No, 'tcp' is just a shorthand for 'tcp0' - they are 100% equivalent
> to each other.
>
> Isaac
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090611/f42b3844/attachment.htm>


More information about the lustre-discuss mailing list