[Lustre-discuss] files/directories are temporarily unavailable on patchless clients

Harald van Pee pee at hiskp.uni-bonn.de
Tue Mar 4 16:19:40 PST 2008


On Wednesday 05 March 2008 01:06 am, Andreas Dilger wrote:
> On Mar 04, 2008  19:52 +0100, Harald van Pee wrote:
> > I have updated all clients to patched version 1.6.1, the servers still
> > are 1.6.0.1. No lustre related error message  occured since (2 weeks).
> >
> > I think its reasonable (necessary?) to e2fsck all osts and the mdt?
> > The mdt resides on an drbd device configured as failover.
> >
> > I now have the following questions.
> > 1. Is there a recommended order to do the file system checks? mdt first
> > and than the osts or vice versa?
> >
> > 2. If I umount the mdt should I use -f ? I assume there will be no file
> > system access possible as long the mdt is back again. Would it be better
> > to umount all servers and clients and than the mdt?
> >
> > 3. I think each ost can be checked during the others are working, but I
> > am unsure if I should use -f to umount or not?
> >
> > 4. should I unmount all clients? If this is recommended  anyway, its
> > maybe better to stop file system access for a couple of hours (2TB 70%
> > used), but do the filesystem checks in parallel.
>
> If you are expecting to fix the filesystem, it is best to just unmount
> everything and run e2fsck in parallel.  Alternately, you can just force
> unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs
> are restarted, but this can be more troublesome in some cases.

o.k. thanks,
 than I will unmount all clients first and than
unmount all osts
and the mdt as the last.
If it is possible should I try to avoid the -f flag?

>
> > On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
> > > On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
> > > > The directory is just not there! Directory or file not found.
> > > >
> > > > in my opinion there is no error message on the clients which is
> > > > directly related to the problem on our node0010 today I have seen
> > > > this problem a several time. Mostly the directory is not seen!
> > > > Probably all of the other directories can be accessed at the same
> > > > time.
> > > >
> > > > and here all lustre related messages from the last days (others are
> > > > mostly timestamps!)
> > > >
> > > >
> > > >
> > > > Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
> > > > (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800
> > > > alias
> > >
> > > A quick search in bugzilla for this error message shows bug 12123,
> > > which is fixed in the 1.6.1 release, and also has a patch.
> > >
> > > Cheers, Andreas
> > > --
> > > Andreas Dilger
> > > Sr. Staff Engineer, Lustre Group
> > > Sun Microsystems of Canada, Inc.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn



More information about the lustre-discuss mailing list