[lustre-discuss] Lustre filesystem suddenly not allowing *new* mounts, but exciting mounts continue working.

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue May 17 13:32:42 PDT 2016


Have you tried doing a writeconf to regenerate the config logs?

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


> On May 17, 2016, at 12:08 PM, Randall Radmer <radmer at slac.stanford.edu> wrote:
> 
> We've been working with lustre systems for a few years, but have an odd problem that started a couple weeks ago.  After a recent hardware problem with an OSS attached storage array, we lost one OST.  This we can mange, but the filesystem is not well in other ways.
> 
> The odd thing is now we are unable to mount the filesytem from any clients, even though existing mounts continue to work fine.  So rebooting a client with a working mount leaves us with a client with a nonwokring mount.  We see nothing useful in the logs.  Rebooting MDS and the OSS does not clear the problem.  Servers are running lustre version 2.4.1, and clients are more current.
> 
> Note that this system has been working for well over a year, and nothing has been intentionally changed.  I thought the MDT might have gotten corrupted, but running lfsck and e2fsck didn't help (they found and fixed a few problem, but not the mount issue).  I'm still not sure if it is an MDT issue, or somehow connected to the failed OST (which I've deactivated and is not mounted on the OSS).
> 
> Can someone give me suggestions one how to better understand this problem?
> 
> The following is output from a mount attempt from a RHEL6 client:
> 
> # mount -t lustre 192.168.1.2 at tcp:/ana04 /reg/data/ana04
> mount.lustre: mount 192.168.1.2 at tcp:/ana04 at /reg/data/ana04 failed: Function not implemented
> 
> # cat /proc/fs/lustre/version 
> lustre: 2.8.0
> kernel: patchless_client
> build:  jenkins-arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel-12--PRISTINE-2.6.32-573.3.1.el6.x86_64
> 
> # grep ana04 /var/log/messages
> May 17 08:57:29 test123 kernel: Lustre:    cmd=cf00f 0:ana04-OST0009-osc  1:mdc.active=0  
> May 17 08:57:29 test123 kernel: LustreError: 15c-8: MGC192.168.1.2 at tcp: The configuration from log 'ana04-client' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
> May 17 08:57:29 test123 kernel: LustreError: 32344:0:(lov_obd.c:922:lov_cleanup()) ana04-clilov-ffff880133c2b000: lov tgt 1 not cleaned! deathrow=0, lovrc=1
> May 17 08:57:29 test123 kernel: Lustre: Unmounted ana04-client
> 
> 
> Thanks much,
> Randall Radmer
> radmer at slac.stanford.edu
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list