[Lustre-discuss] Confusion about failover

Dhruv DhruvDesaai at gmail.com
Wed Jun 25 23:02:24 PDT 2008


Hello Everybody,
I am a novice in using Lustre. Wanted some help.

Case1:
I tried the ost failover . Following is the config file to generate
xml file.

rm -f failover_2node.xml
./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
nettype tcp
./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
nettype tcp
./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
nettype tcp
./lmc -m failover_2node.xml --add net --node client --nid '*' --
nettype tcp

# Cofigure MDS
./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test --
fstype ldiskfs --dev /dev/sdb5

# Cofigure LOV
./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0

# Configures OSTs
./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
--ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
--ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7

# Configure client (this is a 'generic' client used for all client
mounts)
./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
lustre --mds mds_test --lov lov_test

.
Following were my lconf commands.

1. lconf --reformat --node node-ost1 failover_2node.xml   .... on sm02
2. lconf --reformat --node node-ost2 --service=ost1
failover_2node.xml .... on sm06
3. lconf --reformat --node node-mds failover_2node.xml.... on sm01
4. lconf --node client failover_2node.xml ... on sm02 and sm06

So my intention is to keep a failover ost node incase one fails. MDS
is on a seperate node. I tried different scenarios where one ost goes
down and still data can be retrieved from other. New files can be
created and old can be deleted on the failover ost. Data was available
most of time.

So my question is whether Linux HA required to configure such failover
scenario?

Case2:
I tried with same sort of formula as shown above for a failover MDS.
But when the main MDS fails, it doesnt switches to new MDS. Also when
the main MDS comes up again, the file system doesnt recover. I brought
down the client and again brought up. Then it was working.

So is Linux HA or similar program necessary for configuring failover??

Dhruv



More information about the lustre-discuss mailing list