[lustre-discuss] Lustre filesystem suddenly not allowing *new* mounts, but exciting mounts continue working.

Randall Radmer radmer at slac.stanford.edu
Tue May 17 09:08:13 PDT 2016


We've been working with lustre systems for a few years, but have an odd
problem that started a couple weeks ago.  After a recent hardware problem
with an OSS attached storage array, we lost one OST.  This we can mange,
but the filesystem is not well in other ways.

The odd thing is now we are unable to mount the filesytem from any clients,
even though existing mounts continue to work fine.  So rebooting a client
with a working mount leaves us with a client with a nonwokring mount.  We
see nothing useful in the logs.  Rebooting MDS and the OSS does not clear
the problem.  Servers are running lustre version 2.4.1, and clients are
more current.

Note that this system has been working for well over a year, and nothing
has been intentionally changed.  I thought the MDT might have gotten
corrupted, but running lfsck and e2fsck didn't help (they found and fixed a
few problem, but not the mount issue).  I'm still not sure if it is an MDT
issue, or somehow connected to the failed OST (which I've deactivated and
is not mounted on the OSS).

Can someone give me suggestions one how to better understand this problem?

The following is output from a mount attempt from a RHEL6 client:

# mount -t lustre 192.168.1.2 at tcp:/ana04 /reg/data/ana04
mount.lustre: mount 192.168.1.2 at tcp:/ana04 at /reg/data/ana04 failed:
Function not implemented

# cat /proc/fs/lustre/version
lustre: 2.8.0
kernel: patchless_client
build:
 jenkins-arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel-12--PRISTINE-2.6.32-573.3.1.el6.x86_64

# grep ana04 /var/log/messages
May 17 08:57:29 test123 kernel: Lustre:    cmd=cf00f 0:ana04-OST0009-osc
 1:mdc.active=0
May 17 08:57:29 test123 kernel: LustreError: 15c-8: MGC192.168.1.2 at tcp: The
configuration from log 'ana04-client' failed (-38). This may be the result
of communication errors between this node and the MGS, a bad configuration,
or other errors. See the syslog for more information.
May 17 08:57:29 test123 kernel: LustreError:
32344:0:(lov_obd.c:922:lov_cleanup()) ana04-clilov-ffff880133c2b000: lov
tgt 1 not cleaned! deathrow=0, lovrc=1
May 17 08:57:29 test123 kernel: Lustre: Unmounted ana04-client


Thanks much,
Randall Radmer
radmer at slac.stanford.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160517/940d0ef0/attachment.htm>


More information about the lustre-discuss mailing list