[Lustre-discuss] How to bypass failed OST without blocking?

Andreas Dilger adilger at sun.com
Wed Jan 16 23:49:12 PST 2008


On Jan 15, 2008  16:03 -0600, Robert Olson wrote:
> Setting up my system that has no OST failover, so would like to set  
> for failout. Have the issues in the 1.6 betas been worked out in  
> 1.6.4.1?

Very little testing is done on failout mode, because even with a single
OSS node the common behaviour is to just reboot the node and continue
using the OSTs thereon.  You can set "lctl -w lnet.panic_on_lbug=1" and
"lctl -w kernel.panic_on_oops" and the node will reboot if a bug is hit
in Lustre or the kernel.  While not 100% covering (it won't reboot on
a deadlock, for example) it is fairly useful.

> On Mar 22, 2007, at 4:55 PM, Nathaniel Rutman wrote:
> 
> > Well, your question prompted me to try this out.
> >
> > There are two issues:
> > 1. failout mode cannot be set on a live filesystem, and can't be  
> > set with lctl conf_param.
> > The wiki page has instructions for setting failout mode at mkfs time
> > https://mail.clusterfs.com/wikis/lustre/MountConf
> > You can also set failout mode with tunefs and writeconf:
> >
> > tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda
> >
> > There can be no Lustre servers or clients running when changing the  
> > failover mode.
> >
> > 2. failout mode is broken in the 1.6 betas.  I have an untested  
> > patch in bug 12005
> > https://bugzilla.lustre.org/show_bug.cgi?id=12005
> > Using failout mode in the betas without this patch will probably  
> > lead to an LBUG on the OST.
> >
> >
> > swin wang wrote:
> >> In our test, we didn't set the failout mode in mkfs, but set it on  
> >> the mdt/mgs
> >> with lctl:
> >>   lctl conf_param testfs-OST0001.failover.mode=failout
> >> but it seem didn't work. when OST0001 is failed, the
> >> client operation is still blocked (with 1.5.97).
> >>
> >> 2007/3/22, Nathaniel Rutman < nathan at clusterfs.com  
> >> <mailto:nathan at clusterfs.com>>:
> >>
> >>     swin wang wrote:
> >>     > We current use 1.5.97, we try to set it to failout mode, but it
> >>     didn't
> >>     > work
> >>     > int this version, what we want is:  when read/write  the  
> >> failed OST,
> >>     > it return
> >>     > IO errors, but still can create and read/write new files,  
> >> when the
> >>     > failed OST
> >>     > is ok, we can read/write files on the failed OST.
> >>     That's what failout mode it.  How did you try to set it?
> >>
> >>     > I'm not sure if the 1.4.x version with "failout" mode can
> >>     provide what we
> >>     > want?
> >>     >
> >>
> >>
> >> --------------------------------------------------------------------- 
> >> ---
> >>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at clusterfs.com
> >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >>
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list