[Lustre-discuss] network failover with IB+eth?

Tue Apr 8 13:08:26 PDT 2008

Erich Focht wrote:
> Hello,
> 
> on a setup with o2ib and ethernet configured on both, lustre servers and
> clients I'd expect that unplugging the infiniband cable on one of the
> OSSes would lead the client to switch over to ethernet and continue I/O.

No, unfortunately that's not how multiple interfaces work with LNET.
When multiple interfaces are present at connection setup we pick the 
'best' route.
Once we establish a connection, we expect that connection to continue.
Connections do not fail over if multiple interfaces are present.

> Unfortunately this doesn't happen, the client I/O stalls and continues
> only after the IB cable is plugged back.

Yup, that's expected behaviour.
> 
> Is there anything wrong with the setup? It's with pairwise failover
> servers,
> so maybe that's part of the problem? Is the order of failnode arguments
> correct?

The setup appears to be correct, all the failnode does is complicate the 
situation slightly, as the failnode is tried first instead of just 
failing right away. You have a list of failover connections for each 
network type. LNET will try the failover only on the common network. So 
a tcp connection would first retry the tcp address, and as you show the 
IB side attempts retry on the IB failnode.
cliffw
> 
> Here's what we have: (sorry for the many details...)
> 
> MGS/MGT are mounted on the same node:
>  Target:     MGS
>  Index:      unassigned
>  Lustre FS:  lustre
>  Mount type: ldiskfs
>  Flags:      0x174      (MGS needs_index first_time update writeconf )
>  Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> Parameters:
>  failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp
>  mgsnode=10.3.0.227 at o2ib,192.168.130.227 at tcp,10.3.0.226 at o2ib,192.168.130.226 at tcp
> 
>  Target:     lustre-MDT0000
>  Index:      0
>  Lustre FS:  lustre
>  Mount type: ldiskfs
>  Flags:      0x1        (MDT )
>  Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> Parameters:
>  mgsnode=10.3.0.226 at o2ib,192.168.130.226 at tcp,10.3.0.227 at o2ib,192.168.130.227 at tcp
>  failover.node=10.3.0.227 at o2ib,192.168.130.227 at tcp
>  mdt.group_upcall=/usr/sbin/l_getgroups
> 
> OST: parameters were rewritten with tunefs.lustre:
> tunefs.lustre --ost --erase-param
>  --mgsnode=10.3.0.226 at o2ib0,192.168.130.226 at tcp0:10.3.0.227 at o2ib0,192.168.130.227 at tcp0
>  --failnode=10.3.0.229 at o2ib0,192.168.130.229 at tcp0 --writeconf
>  /dev/mpath/ost100
> 
> 
> Client notices the failed OST path:
> # lfs check servers
> lustre-MDT0000-mdc-ffff810007107000 active.
> error: check 'lustre-OST0000-osc-ffff810007107000': Connection timed out
> (110)
> 
> but tries to connect to the failover OSS partner instead of trying the
> other
> network:
> netptune121: LustreError: 11-0: an error occurred while communicating
>   with 10.3.0.229 at o2ib. The ost_connect operation failed with -19
> doss2: LustreError: 137-5: UUID 'lustre-OST0000_UUID' is not available
>   for connect (no target)
> 
> Thanks in advance for any hint...
> 
> Best regards,
> Erich
> <br><br>
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss