[Lustre-discuss] Understanging LNET routing

Vsevolod Nikonorov v.nikonorov at nikiet.ru
Wed Aug 14 06:38:40 PDT 2013


Hello everybody.

I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network, but it refuses with the following error:

	mount.lustre: mount 10.3.0.102 at tcp:/SANDBOX at /mnt/lustre failed: Cannot send after transport endpoint shutdown

If then I check LNET routing using "lctl show_route" command it shows me the following:

	net                tcp hops 1 gw                  10.4.0.105 at tcp1 down

"down" status appears only after first mount attempt after reboot, standing "up" before. 

What am I doing wrong? Thanks in advance!



I have attached a drawing which explains the topology.
Machines from my Lustre emvironment have the following network configurations.

=====

MDS.

ifconfig:

eth0      Link encap:Ethernet  HWaddr 00:50:56:B9:04:8A  
          inet addr:10.3.0.102  Bcast:10.3.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:48a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4510 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4439 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:666698 (651.0 KiB)  TX bytes:695697 (679.3 KiB)

lctl list_nids:

	10.3.0.102 at tcp

lctl route_show:

	net               tcp1 hops 1 gw                   10.3.0.105 at tcp up

=====

OSS1.

ifconfig:

eth0      Link encap:Ethernet  HWaddr 00:50:56:B9:79:51  
          inet addr:10.3.0.103  Bcast:10.3.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:7951/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2482 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2398 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:388187 (379.0 KiB)  TX bytes:365254 (356.6 KiB)

lctl list_nids:

	10.3.0.103 at tcp

lctl route_show:

	net               tcp1 hops 1 gw                   10.3.0.105 at tcp up

=====

OSS2.

ifconfig:

eth0      Link encap:Ethernet  HWaddr 00:50:56:B9:22:76  
          inet addr:10.3.0.104  Bcast:10.3.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:2276/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2522 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2407 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:394006 (384.7 KiB)  TX bytes:364467 (355.9 KiB)

lctl list_nids:

	10.3.0.104 at tcp

lctl route_show:

	net               tcp1 hops 1 gw                   10.3.0.105 at tcp up

=====

router.

ifconfig:

eth0      Link encap:Ethernet  HWaddr 00:50:56:B9:07:B2  
          inet addr:10.3.0.105  Bcast:10.3.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:7b2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:291 errors:0 dropped:0 overruns:0 frame:0
          TX packets:249 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:51645 (50.4 KiB)  TX bytes:50121 (48.9 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:B9:7E:CA  
          inet addr:10.4.0.105  Bcast:10.4.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:7eca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:41 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2474 (2.4 KiB)  TX bytes:906 (906.0 b)

lctl list_nids:

	10.3.0.105 at tcp
	10.4.0.105 at tcp1

lctl show_route:

	<nothing here>

=====

OSC.

ifconfig:

eth0      Link encap:Ethernet  HWaddr 00:50:56:B9:6C:1A  
          inet addr:10.4.0.101  Bcast:10.4.0.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:feb9:6c1a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:204 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:43784 (42.7 KiB)  TX bytes:39666 (38.7 KiB)

lctl list_nids:

	10.4.0.101 at tcp1

lctl show_route:

	net                tcp hops 1 gw                  10.4.0.105 at tcp1 up

=====

-- 
Всеволод Никоноров,
ОИТТиС, НИКИЭТ

<v.nikonorov at nikiet.ru>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_topology.pdf
Type: application/pdf
Size: 18679 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130814/7ffc596e/attachment.pdf>


More information about the lustre-discuss mailing list