[Lustre-discuss] Lustre failover to second subnet

Ed Lucero ed.lucero at imatrix.com
Thu Nov 8 09:20:27 PST 2012


I have a lustre test environment. I'm currently testing network failover.
Failover works fine on subnet 1. When I turn off subnet 1 on lustre servers.
The clients can't 

recover on to subnet 2.

 

Here is the configuration. All the servers and clients are on the same two
subnets.

 

I tried mounting the lustre files systems with this command, but the
failover to network 2 still failed.

 

mount -t lustre -o flock
10.244.1.120 at tcp0:10.244.1.121 at tcp0:10.244.2.120 at tcp1:10.244.2.121 at tcp1:/web
fs /imatrix

 

Any ideas?

 

Ed

 

Network

-----------

Subnet1 - 10.244.1.0\24

Subnet2 - 10.244.2.0\24

 

Server1 - 10.244.1.120, 10.244.2.120

Server2 - 10.244.1.121, 10.244.2.121

Server3 - 10.244.1.100, 10.244.2.100

Client1 - 10.244.1.101, 10.244.2.101

Client2 - 10.244.1.102, 10.244.2.102

Client3 - 10.244.1.122, 10.244.2.122

Client4 - 10.244.1.123, 10.244.2.123

Client5 - 10.244.1.250, 10.244.2.250

 

Lustre Configuration

-------------------------

Server1 - mgs  webmdt  webost1 mailost2

Server2 - mailmdt mailos1 webost2

Server3 - devmdt devost1

 

# MGS Node on server1

tunefs.lustre --erase-param --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/lustremgs

 

#MDT nodes on server1

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/webmdt

 

#MDT nodes on server2

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0 --writeconf
/dev/mapper/mailmdt

 

#MDT nodes on server3

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --writeconf /dev/mapper/devmdt

 

#OST nodes on server1

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 

/dev/mapper/webost1

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 

/dev/mapper/mailost2

 

#OST nodes on server2

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf 

/dev/mapper/webost2

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf 

/dev/mapper/mailost1

 

#OST nodes on server3

tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 

/dev/mapper/devost1

 

 

LNET entry in modprobe.d/lustre.conf

Server1 - options lnet networks=tcp0(bond0),tcp1(bond1)

Server2 - options lnet networks=tcp0(bond0),tcp1(bond1)

Server3 - options lnet network= tcp0(eth0),tcp1(eth1)

 

Five Clients

Client1 - options lnet networks=tcp0(eth0),tcp1(eth1)

Client2 - options lnet networks=tcp0(eth0),tcp1(eth1)

Client3 - options lnet networks=tcp0(eth0),tcp1(eth1)

Client4 - options lnet networks=tcp0(eth0),tcp1(eth1)

Client5 - options lnet networks=tcp0(eth0),tcp1(eth1)

 

Mount Command

----------------------

#Mounts on server1

mount -t lustre -o abort_recov /dev/mapper/lustremgs /lustremgs

mount -t lustre -o abort_recov /dev/mapper/webmdt /webmst

mount -t lustre -o abort_recov /dev/mapper/webost1 /webost1

mount -t lustre -o abort_recov /dev/mapper/mailost2 /mailost2

 

#Mounts on server2

mount -t lustre -o abort_recov /dev/mapper/webost2 /webost2

mount -t lustre -o abort_recov /dev/mapper/mailmdt /mailmst

mount -t lustre -o abort_recov /dev/mapper/mailost1 /mailost1

 

#Mounts on server3

mount -t lustre -o abort_recov /dev/mapper/devmdt /homemst

mount -t lustre -o abort_recov /dev/mapper/devost1 /homeost1

 

#Client Mounts

mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/webfs /imatrix

mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/mailfs
/var/qmail

mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/devfs /home

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20121108/a10a81b2/attachment.htm>


More information about the lustre-discuss mailing list