[Lustre-discuss] Understanging LNET routing

Vsevolod Nikonorov v.nikonorov at nikiet.ru
Thu Aug 15 05:09:45 PDT 2013


Thank you for the advice, my general problem was the default setting of iptables firewall in Centos 6.4 - when I reset them to all-permit state, my route on an OSC stopped falling to "down".

But still routing do not work properly - mount request on OSC just hangs. Although I see some traffic in INPUT and OUTPUT chains of my router, here's a fragment of it:

Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23517 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0                                              
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50793 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50794 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23518 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0                                              
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=599 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0                                                
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51416 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51417 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0
Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=600 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0                                                

I believe this is some Lustre trafic, though I do not know the protocol and cannot understand what is wrong.

Is Lustre routing something to do with TCP/IP routing? Should I set net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for Lustre routing to work properly?

On Wed, 14 Aug 2013 16:16:19 +0200 (CEST)
Hervé Toureille <toureille at cines.fr> wrote:

> Hello Vsevolod , 
> To set the route up , you may use : 
> 
> lctl set_route 10.4.0.105 at tcp1 up 
> 
> Even if it's '==== obsolete ( DANGEROUS ) ====' it works fine on Lustre2.1.4 ;) 
> 
> 
> 
> 
> ----- Mail original -----
> 
> De: " Vsevolod Nikonorov " <v. nikonorov @ nikiet .ru> 
> À: lustre-discuss@ lists .lustre. org 
> Envoyé: Mercredi 14 Août 2013 15:38:40 
> Objet: [Lustre-discuss] Understanging LNET routing 
> 
> Hello everybody . 
> 
> I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network , but it refuses with the following error : 
> 
> mount .lustre: mount 10.3.0.102@ tcp :/ SANDBOX at / mnt /lustre failed : Cannot send after transport endpoint shutdown 
> 
> If then I check LNET routing using " lctl show_route" command it shows me the following : 
> 
> net tcp hops 1 gw 10.4.0.105 at tcp1 down 
> 
> " down " status appears only after first mount attempt after reboot , standing " up " before . 
> 
> What am I doing wrong ? Thanks in advance ! 
> 
> 
> 
> I have attached a drawing which explains the topology . 
> Machines from my Lustre emvironment have the following network configurations. 
> 
> ===== 
> 
> MDS . 
> 
> ifconfig : 
> 
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:04:8A 
> inet addr :10.3.0.102 Bcast :10.3.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:48a/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :4510 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :4439 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :666698 (651.0 KiB ) TX bytes :695697 (679.3 KiB ) 
> 
> lctl list _nids: 
> 
> 10.3.0.102@ tcp 
> 
> lctl route_show: 
> 
> net tcp1 hops 1 gw 10.3.0.105@ tcp up 
> 
> ===== 
> 
> OSS1. 
> 
> ifconfig : 
> 
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:79:51 
> inet addr :10.3.0.103 Bcast :10.3.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:7951/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :2482 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :2398 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :388187 (379.0 KiB ) TX bytes :365254 (356.6 KiB ) 
> 
> lctl list _nids: 
> 
> 10.3.0.103@ tcp 
> 
> lctl route_show: 
> 
> net tcp1 hops 1 gw 10.3.0.105@ tcp up 
> 
> ===== 
> 
> OSS2. 
> 
> ifconfig : 
> 
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:22:76 
> inet addr :10.3.0.104 Bcast :10.3.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:2276/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :2522 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :2407 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :394006 (384.7 KiB ) TX bytes :364467 (355.9 KiB ) 
> 
> lctl list _nids: 
> 
> 10.3.0.104@ tcp 
> 
> lctl route_show: 
> 
> net tcp1 hops 1 gw 10.3.0.105@ tcp up 
> 
> ===== 
> 
> router. 
> 
> ifconfig : 
> 
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:07:B2 
> inet addr :10.3.0.105 Bcast :10.3.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:7b2/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :291 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :249 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :51645 (50.4 KiB ) TX bytes :50121 (48.9 KiB ) 
> 
> eth1 Link encap :Ethernet HWaddr 00:50:56:B9:7E: CA 
> inet addr :10.4.0.105 Bcast :10.4.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:7eca/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :41 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :15 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :2474 (2.4 KiB ) TX bytes :906 (906.0 b) 
> 
> lctl list _nids: 
> 
> 10.3.0.105@ tcp 
> 10.4.0.105 at tcp1 
> 
> lctl show_route: 
> 
> < nothing here > 
> 
> ===== 
> 
> OSC . 
> 
> ifconfig : 
> 
> eth0 Link encap :Ethernet HWaddr 00:50:56:B9:6C:1A 
> inet addr :10.4.0.101 Bcast :10.4.0.255 Mask :255.255.255.0 
> inet6 addr : fe80::250:56ff:feb9:6c1a/64 Scope : Link 
> UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 
> RX packets :204 errors :0 dropped :0 overruns :0 frame :0 
> TX packets :187 errors :0 dropped :0 overruns :0 carrier:0 
> collisions:0 txqueuelen :1000 
> RX bytes :43784 (42.7 KiB ) TX bytes :39666 (38.7 KiB ) 
> 
> lctl list _nids: 
> 
> 10.4.0.101 at tcp1 
> 
> lctl show_route: 
> 
> net tcp hops 1 gw 10.4.0.105 at tcp1 up 
> 
> ===== 
> 
> -- 
> Всеволод Никоноров , 
> ОИТТиС , НИКИЭТ 
> 
> <v. nikonorov @ nikiet .ru> 
> 
> _______________________________________________ 
> Lustre-discuss mailing list 
> Lustre-discuss@ lists .lustre. org 
> http :// lists .lustre. org / mailman / listinfo /lustre-discuss 
> 


-- 
Всеволод Никоноров,
ОИТТиС, НИКИЭТ

<v.nikonorov at nikiet.ru>


-- 
Всеволод Никоноров,
ОИТТиС, НИКИЭТ

<v.nikonorov at nikiet.ru>



More information about the lustre-discuss mailing list