[Lustre-discuss] How to bypass failed OST without blocking?
Andreas Dilger
adilger at sun.com
Wed Jan 16 23:49:12 PST 2008
On Jan 15, 2008 16:03 -0600, Robert Olson wrote:
> Setting up my system that has no OST failover, so would like to set
> for failout. Have the issues in the 1.6 betas been worked out in
> 1.6.4.1?
Very little testing is done on failout mode, because even with a single
OSS node the common behaviour is to just reboot the node and continue
using the OSTs thereon. You can set "lctl -w lnet.panic_on_lbug=1" and
"lctl -w kernel.panic_on_oops" and the node will reboot if a bug is hit
in Lustre or the kernel. While not 100% covering (it won't reboot on
a deadlock, for example) it is fairly useful.
> On Mar 22, 2007, at 4:55 PM, Nathaniel Rutman wrote:
>
> > Well, your question prompted me to try this out.
> >
> > There are two issues:
> > 1. failout mode cannot be set on a live filesystem, and can't be
> > set with lctl conf_param.
> > The wiki page has instructions for setting failout mode at mkfs time
> > https://mail.clusterfs.com/wikis/lustre/MountConf
> > You can also set failout mode with tunefs and writeconf:
> >
> > tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda
> >
> > There can be no Lustre servers or clients running when changing the
> > failover mode.
> >
> > 2. failout mode is broken in the 1.6 betas. I have an untested
> > patch in bug 12005
> > https://bugzilla.lustre.org/show_bug.cgi?id=12005
> > Using failout mode in the betas without this patch will probably
> > lead to an LBUG on the OST.
> >
> >
> > swin wang wrote:
> >> In our test, we didn't set the failout mode in mkfs, but set it on
> >> the mdt/mgs
> >> with lctl:
> >> lctl conf_param testfs-OST0001.failover.mode=failout
> >> but it seem didn't work. when OST0001 is failed, the
> >> client operation is still blocked (with 1.5.97).
> >>
> >> 2007/3/22, Nathaniel Rutman < nathan at clusterfs.com
> >> <mailto:nathan at clusterfs.com>>:
> >>
> >> swin wang wrote:
> >> > We current use 1.5.97, we try to set it to failout mode, but it
> >> didn't
> >> > work
> >> > int this version, what we want is: when read/write the
> >> failed OST,
> >> > it return
> >> > IO errors, but still can create and read/write new files,
> >> when the
> >> > failed OST
> >> > is ok, we can read/write files on the failed OST.
> >> That's what failout mode it. How did you try to set it?
> >>
> >> > I'm not sure if the 1.4.x version with "failout" mode can
> >> provide what we
> >> > want?
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> ---
> >>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at clusterfs.com
> >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >>
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list