[Lustre-discuss] Understanging LNET routing
Vsevolod Nikonorov
v.nikonorov at nikiet.ru
Thu Aug 15 05:09:45 PDT 2013
Thank you for the advice, my general problem was the default setting of iptables firewall in Centos 6.4 - when I reset them to all-permit state, my route on an OSC stopped falling to "down".
But still routing do not work properly - mount request on OSC just hangs. Although I see some traffic in INPUT and OUTPUT chains of my router, here's a fragment of it:
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23517 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50793 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50794 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23518 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=599 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51416 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51417 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=600 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
I believe this is some Lustre trafic, though I do not know the protocol and cannot understand what is wrong.
Is Lustre routing something to do with TCP/IP routing? Should I set net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for Lustre routing to work properly?
On Wed, 14 Aug 2013 16:16:19 +0200 (CEST)
Hervé Toureille <toureille at cines.fr> wrote:
> Hello Vsevolod ,
> To set the route up , you may use :
>
> lctl set_route 10.4.0.105 at tcp1 up
>
> Even if it's '==== obsolete ( DANGEROUS ) ====' it works fine on Lustre2.1.4 ;)
>
>
>
>
> ----- Mail original -----
>
> De: " Vsevolod Nikonorov " <v. nikonorov @ nikiet .ru>
> À: lustre-discuss@ lists .lustre. org
> Envoyé: Mercredi 14 Août 2013 15:38:40
> Objet: [Lustre-discuss] Understanging LNET routing
>
> Hello everybody .
>
> I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network , but it refuses with the following error :
>
> mount .lustre: mount 10.3.0.102@ tcp :/ SANDBOX at / mnt /lustre failed : Cannot send after transport endpoint shutdown
>
> If then I check LNET routing using " lctl show_route" command it shows me the following :
>
> net tcp hops 1 gw 10.4.0.105 at tcp1 down
>
> " down " status appears only after first mount attempt after reboot , standing " up " before .
>
> What am I doing wrong ? Thanks in advance !
>
>
>
> I have attached a drawing which explains the topology .
> Machines from my Lustre emvironment have the following network configurations.
>
> =====
>
> MDS .
>
> ifconfig :
>
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:04:8A
> inet addr :10.3.0.102 Bcast :10.3.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:48a/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :4510 errors :0 dropped :0 overruns :0 frame :0
> TX packets :4439 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :666698 (651.0 KiB ) TX bytes :695697 (679.3 KiB )
>
> lctl list _nids:
>
> 10.3.0.102@ tcp
>
> lctl route_show:
>
> net tcp1 hops 1 gw 10.3.0.105@ tcp up
>
> =====
>
> OSS1.
>
> ifconfig :
>
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:79:51
> inet addr :10.3.0.103 Bcast :10.3.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:7951/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :2482 errors :0 dropped :0 overruns :0 frame :0
> TX packets :2398 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :388187 (379.0 KiB ) TX bytes :365254 (356.6 KiB )
>
> lctl list _nids:
>
> 10.3.0.103@ tcp
>
> lctl route_show:
>
> net tcp1 hops 1 gw 10.3.0.105@ tcp up
>
> =====
>
> OSS2.
>
> ifconfig :
>
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:22:76
> inet addr :10.3.0.104 Bcast :10.3.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:2276/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :2522 errors :0 dropped :0 overruns :0 frame :0
> TX packets :2407 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :394006 (384.7 KiB ) TX bytes :364467 (355.9 KiB )
>
> lctl list _nids:
>
> 10.3.0.104@ tcp
>
> lctl route_show:
>
> net tcp1 hops 1 gw 10.3.0.105@ tcp up
>
> =====
>
> router.
>
> ifconfig :
>
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:07:B2
> inet addr :10.3.0.105 Bcast :10.3.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:7b2/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :291 errors :0 dropped :0 overruns :0 frame :0
> TX packets :249 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :51645 (50.4 KiB ) TX bytes :50121 (48.9 KiB )
>
> eth1 Link encap :Ethernet HWaddr 00:50:56:B9:7E: CA
> inet addr :10.4.0.105 Bcast :10.4.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:7eca/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :41 errors :0 dropped :0 overruns :0 frame :0
> TX packets :15 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :2474 (2.4 KiB ) TX bytes :906 (906.0 b)
>
> lctl list _nids:
>
> 10.3.0.105@ tcp
> 10.4.0.105 at tcp1
>
> lctl show_route:
>
> < nothing here >
>
> =====
>
> OSC .
>
> ifconfig :
>
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:6C:1A
> inet addr :10.4.0.101 Bcast :10.4.0.255 Mask :255.255.255.0
> inet6 addr : fe80::250:56ff:feb9:6c1a/64 Scope : Link
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1
> RX packets :204 errors :0 dropped :0 overruns :0 frame :0
> TX packets :187 errors :0 dropped :0 overruns :0 carrier:0
> collisions:0 txqueuelen :1000
> RX bytes :43784 (42.7 KiB ) TX bytes :39666 (38.7 KiB )
>
> lctl list _nids:
>
> 10.4.0.101 at tcp1
>
> lctl show_route:
>
> net tcp hops 1 gw 10.4.0.105 at tcp1 up
>
> =====
>
> --
> Всеволод Никоноров ,
> ОИТТиС , НИКИЭТ
>
> <v. nikonorov @ nikiet .ru>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@ lists .lustre. org
> http :// lists .lustre. org / mailman / listinfo /lustre-discuss
>
--
Всеволод Никоноров,
ОИТТиС, НИКИЭТ
<v.nikonorov at nikiet.ru>
--
Всеволод Никоноров,
ОИТТиС, НИКИЭТ
<v.nikonorov at nikiet.ru>
More information about the lustre-discuss
mailing list