[Lustre-discuss] Can't Create New Files

Adam adam at sharcnet.ca
Thu Sep 10 18:33:16 PDT 2009


Okay, solved!

I can't take credit for this, but this should work for anyone in the
same 1.6.6 situation. Documented here to make it easier for others to
find this solution.

Problem, corrupt CATALOGS file. 

Solution:

(I also unmounted all the OSTs, MGT -- left the clients mounted)

>From a location where you can mount the mdt:

mkdir /tmp/lfs_mdt
mount -t ldiskfs /dev/mdtpath /tmp/lfs_mdt
mv /tmp/lfs_mdt/CATALOGS /tmp/lfs_mdt/CATALOGS.old
touch /tmp/lfs_mdt/CATALOGS
umount /tmp/lfs_mdt

and then, Boo-Yeah!:

[root at MDS /]# pdsh -a 'df | grep path | wc -l;
cat /proc/fs/lustre/heal*; lctl dl | wc -l' | sort
OSS10: 4
OSS10: 6
OSS10: healthy
MGS: 1
MGS: 2
MGS: healthy
MDS: 1
MDS: 36
MDS: healthy
OSS3: 4
OSS3: 6
OSS3: healthy
OSS4: 4
OSS4: 6
OSS4: healthy
OSS5: 4
OSS5: 6
OSS5: healthy
OSS6: 4
OSS6: 6
OSS6: healthy
OSS7: 4
OSS7: 6
OSS7: healthy
OSS8: 4
OSS8: 6
OSS8: healthy
OSS9: 4
OSS9: 6
OSS9: healthy

Thank you for the advise Andreas, in this case the lustre logging was
indeed key to determining the true nature of the problem.

Thank you,
Adam

On Thu, 2009-09-10 at 20:51 -0400, Adam wrote:
> Unmounting worked, but remounting resulted in a LBUG (edited):
> 
> X=25043:0
> X:(tracefile.c:450:libcfs_assertion_failed()) LBUG
> X:(handler.c:2049:mds_setup()) ASSERTION(!
> lvfs_check_rdonly(lvfs_sbdev(mnt->mnt_sb))) failed
> X:(tracefile.c:450:libcfs_assertion_failed()) LBUG
> 
> After rebooting, an attempted mount of the mds results in (heavily
> edited):
> 
> lfs-MDT0000: denying duplicate export for
> 91134603-5957-7699-8c87-6305e1e508d5
> (class_hash.c:190:lustre_hash_additem_unique()) Already found the key in
> hash [UUID_HASH]
> (llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile
> 0x7360029:0x992d6208: rc -2
> (llog_cat.c:176:llog_cat_id2handle()) error opening log id
> 0x7360029:992d6208: rc -2
> (llog_obd.c:262:cat_cancel_cb()) Cannot find handle for log 0x7360029
> (llog_obd.c:329:llog_obd_origin_setup()) llog_process with cat_cancel_cb
> failed: -2
> Failing over lfs-MDT0000
> setting obd lfs-MDT0000 device 'unknown-block(253,1)' read-only ***
> Turning device dm-1 (0xfd00001) read-only
> 
> and the new mount eventually fails:
> 
> [root at MDS ~]# mount -t lustre /dev/mapper/mpath1 /mnt/mds
> mount.lustre: mount /dev/mapper/mpath1 at /mnt/mds failed: No such file
> or directory
> Is the MGS specification correct?
> Is the filesystem name correct?
> If upgrading, is the copied client log valid? (see upgrade docs)
> 
> e2fsck shows the mdt as clean -- any idea as to what's tripping it?
> 
> Thanks,
> Adam
> 
> On Thu, 2009-09-10 at 21:41 +0200, Andreas Dilger wrote:
> > On Sep 10, 2009  11:00 -0400, Adam wrote:
> > > -5, sorry I took a far better look then simply checking the health
> > > status. (Part of) the problem is that the MDS doesn't see the OSTs,
> > > eg:
> > > 
> > > [root at MDS ~]# lctl dl
> > >   0 UP mgc MGC10.29.48.1 at o2ib eeb1eccc-727e-f3f8-824a-9862a42e3b08 5
> > >   1 UP mdt MDS MDS_uuid 3
> > >   2 UP lov lfs-mdtlov lfs-mdtlov_UUID 4
> > >   3 UP mds lfs-MDT0000 lfs-MDT0000_UUID 671
> > > [root at MDS ~]# 
> > 
> > You need to look at the MDS startup logs and/or just shut down the
> > MDS and start it up again, to see why it isn't connecting to the
> > OSS nodes.  If it can't connect to any of them, I would suspect
> > a network problem.  Try "ping", "lctl ping", "telnet OSS5 988" to
> > see if you can check whether the connection is working.
> > 
> > > the OSS's seem okay:
> > >
> > > [root at OSS5 ~]# lctl dl
> > >   0 UP mgc MGC10.29.48.1 at o2ib af5bdde6-7fe2-a944-05cd-28459ef91385 5
> > >   1 UP ost OSS OSS_uuid 3
> > >   2 UP obdfilter lfs-OST0002 lfs-OST0002_UUID 671
> > >   3 UP obdfilter lfs-OST000a lfs-OST000a_UUID 671
> > >   4 UP obdfilter lfs-OST0012 lfs-OST0012_UUID 671
> > >   5 UP obdfilter lfs-OST001a lfs-OST001a_UUID 671
> > > [root at OSS5 ~]# 
> > > 
> > > writeconf?
> > 
> > Well, that is a last resort, it probably isn't needed.
> > 
> > > On Thu, 2009-09-10 at 10:45 -0400, Oleg Drokin wrote:
> > > > Hello!
> > > > 
> > > > On Sep 10, 2009, at 10:33 AM, Adam wrote:
> > > > > I'm running into a rather strange problem where new files cannot be
> > > > > created (lustre 1.6.6), but existing files can be modified and even
> > > > > deleted.
> > > > > The MDS reports:
> > > > > LustreError: X:0:(mds_open.c:431:mds_create_objects()) error creating
> > > > > objects for inode Y:
> > > > > For a large subset of inodes.
> > > > 
> > > > What is the rc value reported in that message?
> > > > Are there any messages on OSTs at the same time?
> > > > 
> > > > Bye,
> > > >      Oleg
> > > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list