[lustre-discuss] Adding a servicenode (failnode) to existing OSTs

Artem Blagodarenko artem.blagodarenko at gmail.com
Wed Apr 4 01:56:56 PDT 2018


Hello,

But If writeconf is not possible there is patch that currently is being inspected that adds replace_nids failover nodes support (https://jira.hpdd.intel.com/browse/LU-10384).
You are welcome for inspection and testing.

Thanks,
Artem Blagodarenko.

> On 3 Apr 2018, at 20:46, Vicker, Darby (JSC-EG311) <darby.vicker-1 at nasa.gov> wrote:
> 
> We have a similar setup and recently had to do something similar - in our case, to add a 2nd IB NID.  The admin node says that servicenode is preferred over failnode, so that's what we use.  It works great - we love the capability to fail over for maintenance or troubleshooting.  Our tunfs.lustre commands looked like this:
> 
> 
> MDT:
> 
>       tunefs.lustre \
>           --verbose \
>           --writeconf \
>           --erase-params \
>           --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_IB_L1_IP}@o2ib0,${LUSTRE_LOCAL_IB_EUROPA_IP}@o2ib1 \
>           --servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_L1_IP}@o2ib0,${LUSTRE_PEER_IB_EUROPA_IP}@o2ib1 \
>           $pool/meta-fsl
> 
> OSTs:
> 
>      tunefs.lustre \
>           --verbose \
>           --writeconf \
>           --erase-params \
>           --mgsnode=192.52.98.30 at tcp0,10.148.0.30 at o2ib0,10.150.100.30 at o2ib1 \
>           --mgsnode=192.52.98.31 at tcp0,10.148.0.31 at o2ib0,10.150.100.31 at o2ib1 \
>           --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_IB_L1_IP}@o2ib0,${LUSTRE_LOCAL_IB_EUROPA_IP}@o2ib1 \
>           --servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_L1_IP}@o2ib0,${LUSTRE_PEER_IB_EUROPA_IP}@o2ib1 \
>           $pool/ost-fsl
> 
> 
> 
> 
> You have to shut down the entire lustre file system to do this - clients and servers.  See the "Adding a NID" section of that admin manual for the full procedure.  
> 
> 
> -----Original Message-----
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Steve Barnet <barnet at icecube.wisc.edu>
> Reply-To: "barnet at icecube.wisc.edu" <barnet at icecube.wisc.edu>
> Date: Tuesday, April 3, 2018 at 12:14 PM
> To: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] Adding a servicenode (failnode) to existing OSTs
> 
> Hi all,
> 
>   We have a multipath OSS->OST hardware configuration which,
> in principle, allows us to move OSTs between OSSes if/when
> we need to do maintenance. In general, this has worked very
> well for us.
> 
>   However, today when I needed to do some maintenance, I
> discovered that we have a set of OSTs that were not formatted
> for failover (no service nodes specified at creation time).
> 
> So after a bit of noodling around, it looks like we should be
> able to add these after the fact. Groovy! There appear to be a
> couple ways that this could be done:
> 
> a) Add the service nodes:
>    tunefs.lustre --servicenode=nid,nid /dev/<OST>
> 
> b) Add a failover node:
>    tunefs.lustre --param="failover.node=<nid> /dev/<OST>
> 
> 
> So a few questions:
> 
> 1) Is one form preferred over the other? It seems like just
>    specifying service nodes might be better than being
>    specific about primary vs. failover.
> 
> 2) Will this be visible to the clients right away, or will
>    additional steps be needed? I could see needing to remount
>    the filesystem on the clients, or writeconf or something similar.
> 
> 3) Any other gotchas I should be wary of?
> 
> Thanks much!
> 
> Best,
> 
> ---Steve
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list