[Lustre-discuss] Lustre failover to second subnet
    Ed Lucero 
    ed.lucero at imatrix.com
       
    Thu Nov  8 09:20:27 PST 2012
    
    
  
I have a lustre test environment. I'm currently testing network failover.
Failover works fine on subnet 1. When I turn off subnet 1 on lustre servers.
The clients can't 
recover on to subnet 2.
 
Here is the configuration. All the servers and clients are on the same two
subnets.
 
I tried mounting the lustre files systems with this command, but the
failover to network 2 still failed.
 
mount -t lustre -o flock
10.244.1.120 at tcp0:10.244.1.121 at tcp0:10.244.2.120 at tcp1:10.244.2.121 at tcp1:/web
fs /imatrix
 
Any ideas?
 
Ed
 
Network
-----------
Subnet1 - 10.244.1.0\24
Subnet2 - 10.244.2.0\24
 
Server1 - 10.244.1.120, 10.244.2.120
Server2 - 10.244.1.121, 10.244.2.121
Server3 - 10.244.1.100, 10.244.2.100
Client1 - 10.244.1.101, 10.244.2.101
Client2 - 10.244.1.102, 10.244.2.102
Client3 - 10.244.1.122, 10.244.2.122
Client4 - 10.244.1.123, 10.244.2.123
Client5 - 10.244.1.250, 10.244.2.250
 
Lustre Configuration
-------------------------
Server1 - mgs  webmdt  webost1 mailost2
Server2 - mailmdt mailos1 webost2
Server3 - devmdt devost1
 
# MGS Node on server1
tunefs.lustre --erase-param --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/lustremgs
 
#MDT nodes on server1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0 --writeconf
/dev/mapper/webmdt
 
#MDT nodes on server2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0 --writeconf
/dev/mapper/mailmdt
 
#MDT nodes on server3
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --writeconf /dev/mapper/devmdt
 
#OST nodes on server1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 
/dev/mapper/webost1
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 
/dev/mapper/mailost2
 
#OST nodes on server2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf 
/dev/mapper/webost2
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.120 at tcp0
--param="failover.mode=failout" --writeconf 
/dev/mapper/mailost1
 
#OST nodes on server3
tunefs.lustre --erase-param --mgsnode=10.244.1.120 at tcp0
--mgsnode=10.244.1.121 at tcp0 --failnode=10.244.1.121 at tcp0
--param="failover.mode=failout" --writeconf 
/dev/mapper/devost1
 
 
LNET entry in modprobe.d/lustre.conf
Server1 - options lnet networks=tcp0(bond0),tcp1(bond1)
Server2 - options lnet networks=tcp0(bond0),tcp1(bond1)
Server3 - options lnet network= tcp0(eth0),tcp1(eth1)
 
Five Clients
Client1 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client2 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client3 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client4 - options lnet networks=tcp0(eth0),tcp1(eth1)
Client5 - options lnet networks=tcp0(eth0),tcp1(eth1)
 
Mount Command
----------------------
#Mounts on server1
mount -t lustre -o abort_recov /dev/mapper/lustremgs /lustremgs
mount -t lustre -o abort_recov /dev/mapper/webmdt /webmst
mount -t lustre -o abort_recov /dev/mapper/webost1 /webost1
mount -t lustre -o abort_recov /dev/mapper/mailost2 /mailost2
 
#Mounts on server2
mount -t lustre -o abort_recov /dev/mapper/webost2 /webost2
mount -t lustre -o abort_recov /dev/mapper/mailmdt /mailmst
mount -t lustre -o abort_recov /dev/mapper/mailost1 /mailost1
 
#Mounts on server3
mount -t lustre -o abort_recov /dev/mapper/devmdt /homemst
mount -t lustre -o abort_recov /dev/mapper/devost1 /homeost1
 
#Client Mounts
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/webfs /imatrix
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/mailfs
/var/qmail
mount -t lustre -o flock 10.244.1.120 at tcp0:10.244.1.121 at tcp0:/devfs /home
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20121108/a10a81b2/attachment.htm>
    
    
More information about the lustre-discuss
mailing list