[Lustre-discuss] failover on multihomed clusters

Patrice Hamelin patrice.hamelin at ec.gc.ca
Wed Dec 21 07:21:53 PST 2011


Hi,

   If you refer to my previous message, you will see that I have two 
multihomed clusters, each having Lustre servers and  clients.  I have 
clients mounting lustre partitions from o2ib and tcp.  Now I am 
inplementing failover, did a try this morning without success, so RTFM.  
I read:

Note -- If you have an MGS or MDT configured for failover, perform these 
steps:
1. On the OST, list the NIDs of all MGS nodes at mkfs time.
OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1
--mgsnode=10.0.0.2 /dev/{device}
2. On the client, mount the file system.
client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/

So I extended the logic from :

mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s at o2ib3 
<mailto:--failnode%3Dib4-st02s at o2ib4> --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s at o2ib3 
--mgsnode=ib3-st01e at tcp --failnode=ib3-st02s at o2ib3 
<mailto:--failnode%3Dib4-st02s at o2ib4> /dev/mpath/colosse4-lun54-sata

to:

  mkfs.lustre --mgs --mdt --fsname=sata 
--failnode=ib3-st02s at o2ib3,ib3-st02e at tcp --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost 
--mgsnode=ib3-st01s at o2ib3,ib3-st01e at tcp 
--mgsnode=ib3-st02s at o2ib3,ib3-st02e at tcp 
--failnode=ib3-st02s at o2ib3,ib3-st02e at tcp  /dev/mpath/colosse4-lun53-sata

And so on for other  disks.

Partitions mounts great on the MDS/MGS/OSS server, but on the OSS only, 
I have:

[root at ib3-st03 ~]# mount -t lustre /dev/mpath/colosse4-lun55-sata 
/mnt/data/clun55
mount.lustre: mount /dev/mpath/colosse4-lun55-sata at /mnt/data/clun55 
failed: Interrupted system call

messages file contains:

Dec 21 15:18:52 ib3-st03 kernel: Lustre: 
9464:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388814699331655 sent from MGC10.10.135.115 at o2ib3 to NID 
10.10.135.116 at o2ib3 5s ago has timed out (5s prior to deadline).
Dec 21 15:18:52 ib3-st03 kernel:   req at ffff810116fff800 
x1388814699331655/t0 o250->MGS at MGC10.10.135.115@o2ib3_1:26/25 lens 
368/584 e 0 to 1 dl 1324480732 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1112:server_start_targets()) Required registration 
failed for sata-OSTffff: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1453:server_put_super()) no obd sata-OSTffff
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:147:server_deregister_mount()) sata-OSTffff not 
registered
Dec 21 15:18:52 ib3-st03 kernel: Lustre: server umount sata-OSTffff complete
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount  (-4)


so my question is?

What would ne the correct syntax to make sure I have a failover on the 
o2ib clients as well as the tcp clients?

Thanks




-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20111221/a70c844b/attachment.htm>


More information about the lustre-discuss mailing list