[Lustre-discuss] files/directories are temporarily unavailable on patchless clients

Andreas Dilger adilger at sun.com
Tue Mar 4 22:42:17 PST 2008


On Mar 05, 2008  01:19 +0100, Harald van Pee wrote:
> On Wednesday 05 March 2008 01:06 am, Andreas Dilger wrote:
> > On Mar 04, 2008  19:52 +0100, Harald van Pee wrote:
> > > I have updated all clients to patched version 1.6.1, the servers still
> > > are 1.6.0.1. No lustre related error message  occured since (2 weeks).
> > >
> > > I think its reasonable (necessary?) to e2fsck all osts and the mdt?
> > > The mdt resides on an drbd device configured as failover.
> > >
> > > I now have the following questions.
> > > 1. Is there a recommended order to do the file system checks? mdt first
> > > and than the osts or vice versa?
> > >
> > > 2. If I umount the mdt should I use -f ? I assume there will be no file
> > > system access possible as long the mdt is back again. Would it be better
> > > to umount all servers and clients and than the mdt?
> > >
> > > 3. I think each ost can be checked during the others are working, but I
> > > am unsure if I should use -f to umount or not?
> > >
> > > 4. should I unmount all clients? If this is recommended  anyway, its
> > > maybe better to stop file system access for a couple of hours (2TB 70%
> > > used), but do the filesystem checks in parallel.
> >
> > If you are expecting to fix the filesystem, it is best to just unmount
> > everything and run e2fsck in parallel.  Alternately, you can just force
> > unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs
> > are restarted, but this can be more troublesome in some cases.
> 
> o.k. thanks,
>  than I will unmount all clients first and than
> unmount all osts
> and the mdt as the last.

Actually, it is better to unmount clients, then MDT, then OSTs last,
because the MDT is a "client" on the OSTs.

> If it is possible should I try to avoid the -f flag?

You shouldn't need to use -f if you unmount in the above order.

> > > On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
> > > > On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
> > > > > The directory is just not there! Directory or file not found.
> > > > >
> > > > > in my opinion there is no error message on the clients which is
> > > > > directly related to the problem on our node0010 today I have seen
> > > > > this problem a several time. Mostly the directory is not seen!
> > > > > Probably all of the other directories can be accessed at the same
> > > > > time.
> > > > >
> > > > > and here all lustre related messages from the last days (others are
> > > > > mostly timestamps!)
> > > > >
> > > > >
> > > > >
> > > > > Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
> > > > > (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800
> > > > > alias
> > > >
> > > > A quick search in bugzilla for this error message shows bug 12123,
> > > > which is fixed in the 1.6.1 release, and also has a patch.
> > > >
> > > > Cheers, Andreas
> > > > --
> > > > Andreas Dilger
> > > > Sr. Staff Engineer, Lustre Group
> > > > Sun Microsystems of Canada, Inc.
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> 
> -- 
> Harald van Pee
> 
> Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list