[Lustre-discuss] failover on multihomed clusters
Patrice Hamelin
patrice.hamelin at ec.gc.ca
Wed Dec 21 07:21:53 PST 2011
Hi,
If you refer to my previous message, you will see that I have two
multihomed clusters, each having Lustre servers and clients. I have
clients mounting lustre partitions from o2ib and tcp. Now I am
inplementing failover, did a try this morning without success, so RTFM.
I read:
Note -- If you have an MGS or MDT configured for failover, perform these
steps:
1. On the OST, list the NIDs of all MGS nodes at mkfs time.
OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1
--mgsnode=10.0.0.2 /dev/{device}
2. On the client, mount the file system.
client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/
So I extended the logic from :
mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s at o2ib3
<mailto:--failnode%3Dib4-st02s at o2ib4> --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s at o2ib3
--mgsnode=ib3-st01e at tcp --failnode=ib3-st02s at o2ib3
<mailto:--failnode%3Dib4-st02s at o2ib4> /dev/mpath/colosse4-lun54-sata
to:
mkfs.lustre --mgs --mdt --fsname=sata
--failnode=ib3-st02s at o2ib3,ib3-st02e at tcp --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost
--mgsnode=ib3-st01s at o2ib3,ib3-st01e at tcp
--mgsnode=ib3-st02s at o2ib3,ib3-st02e at tcp
--failnode=ib3-st02s at o2ib3,ib3-st02e at tcp /dev/mpath/colosse4-lun53-sata
And so on for other disks.
Partitions mounts great on the MDS/MGS/OSS server, but on the OSS only,
I have:
[root at ib3-st03 ~]# mount -t lustre /dev/mpath/colosse4-lun55-sata
/mnt/data/clun55
mount.lustre: mount /dev/mpath/colosse4-lun55-sata at /mnt/data/clun55
failed: Interrupted system call
messages file contains:
Dec 21 15:18:52 ib3-st03 kernel: Lustre:
9464:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request
x1388814699331655 sent from MGC10.10.135.115 at o2ib3 to NID
10.10.135.116 at o2ib3 5s ago has timed out (5s prior to deadline).
Dec 21 15:18:52 ib3-st03 kernel: req at ffff810116fff800
x1388814699331655/t0 o250->MGS at MGC10.10.135.115@o2ib3_1:26/25 lens
368/584 e 0 to 1 dl 1324480732 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 21 15:18:52 ib3-st03 kernel: LustreError:
23519:0:(obd_mount.c:1112:server_start_targets()) Required registration
failed for sata-OSTffff: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError:
23519:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError:
23519:0:(obd_mount.c:1453:server_put_super()) no obd sata-OSTffff
Dec 21 15:18:52 ib3-st03 kernel: LustreError:
23519:0:(obd_mount.c:147:server_deregister_mount()) sata-OSTffff not
registered
Dec 21 15:18:52 ib3-st03 kernel: Lustre: server umount sata-OSTffff complete
Dec 21 15:18:52 ib3-st03 kernel: LustreError:
23519:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-4)
so my question is?
What would ne the correct syntax to make sure I have a failover on the
o2ib clients as well as the tcp clients?
Thanks
--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20111221/a70c844b/attachment.htm>
More information about the lustre-discuss
mailing list