[Lustre-discuss] files/directories are temporarily unavailable on patchless clients

Andreas Dilger adilger at sun.com
Tue Mar 4 16:06:55 PST 2008


On Mar 04, 2008  19:52 +0100, Harald van Pee wrote:
> I have updated all clients to patched version 1.6.1, the servers still are 
> 1.6.0.1. No lustre related error message  occured since (2 weeks).
> 
> I think its reasonable (necessary?) to e2fsck all osts and the mdt?
> The mdt resides on an drbd device configured as failover.
> 
> I now have the following questions.
> 1. Is there a recommended order to do the file system checks? mdt first and 
> than the osts or vice versa?
> 
> 2. If I umount the mdt should I use -f ? I assume there will be no file system 
> access possible as long the mdt is back again. Would it be better to umount 
> all servers and clients and than the mdt?
> 
> 3. I think each ost can be checked during the others are working, but I am 
> unsure if I should use -f to umount or not?
> 
> 4. should I unmount all clients? If this is recommended  anyway, its maybe 
> better to stop file system access for a couple of hours (2TB 70% used), but 
> do the filesystem checks in parallel.

If you are expecting to fix the filesystem, it is best to just unmount
everything and run e2fsck in parallel.  Alternately, you can just force
unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs
are restarted, but this can be more troublesome in some cases.

> On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
> > On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
> > > The directory is just not there! Directory or file not found.
> > >
> > > in my opinion there is no error message on the clients which is directly
> > > related to the problem on our node0010 today I have seen this problem a
> > > several time. Mostly the directory is not seen! Probably all of the other
> > > directories can be accessed at the same time.
> > >
> > > and here all lustre related messages from the last days (others are
> > > mostly timestamps!)
> > >
> > >
> > >
> > > Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
> > > (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias
> >
> > A quick search in bugzilla for this error message shows bug 12123,
> > which is fixed in the 1.6.1 release, and also has a patch.
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list