[Lustre-discuss] Client Cannot Mount File System

Charles Taylor chasman at ufl.edu
Thu Jun 12 07:18:48 PDT 2014


MDS/OSSs: 1.8.8-wc1_2.6.18_308.4.1.el5_gbc88c4c
Client:           1.8.9-wc1_2.6.32_358.23.2.el6

One (out of hundreds) of our clients has been unable to mount our lustre file system.  We could find no host or network issues.  Attempts to mount yielded the following on the client

mount -t lustre -o localflock 10.13.68.1 at o2ib:10.13.68.2 at o2ib:/lfs /lfs/scratch  
mount.lustre: mount 10.13.68.1 at o2ib:10.13.68.2 at o2ib:/lfs at /lfs/scratch failed:
Interrupted system call
Error: Failed to mount 10.13.68.1 at o2ib:10.13.68.2 at o2ib:/lfs

with the following syslog messages.

Jun 10 15:21:05 r15a-s40 kernel: Lustre: 1269:0:(o2iblnd_cb.c:1813:kiblnd_close_conn_locked()) Closing conn to 10.13.79.252 at o2ib2: error 0(waiting)
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 166-1: MGC10.13.68.1 at o2ib: Connection to service MGS via nid 10.13.68.1 at o2ib was lost; in progress operations using this service will fail.
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 15c-8: MGC10.13.68.1 at o2ib: The configuration from log 'lfs-client' failed (-4). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(llite_lib.c:1099:ll_fill_super()) Unable to process log: -4
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 1 not cleaned! deathrow=0, lovrc=1
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) Skipped 5 previous similar messages
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(lov_obd.c:1012:lov_cleanup()) lov tgt 13 not cleaned! deathrow=1, lovrc=1
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(mdc_request.c:1500:mdc_precleanup()) client import never connected
Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1 at o2ib: Reactivating import
Jun 10 15:21:05 r15a-s40 kernel: Lustre: MGC10.13.68.1 at o2ib: Connection restored to service MGS using nid 10.13.68.1 at o2ib.
Jun 10 15:21:05 r15a-s40 kernel: Lustre: client lfs-client(ffff88061e105c00) umount complete
Jun 10 15:21:05 r15a-s40 kernel: LustreError: 4012:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount  (-4)

Nothing noteworthy on the MDS.   

After reconfiguring the client with a new IPoIB IP (and hence, NID), it was able to mount with no problems and is working fine.    Additionally, the MDS was rebooted at least once during the time that this client in question was unable to mount so it seems like whatever was on the MDT was saved - presumably on the MDT.   

I'm particularly curious about the "ll_fill_super" message.  To what "log" is it referring?   

Anyone seen this before and have an idea what we need to clear on the MDS/MDT to allow this client to successfully mount the file system again?

Thanks,

Charlie Taylor
UF Research Computing





More information about the lustre-discuss mailing list