[lustre-discuss] corrupt FID on zfs?

Dilger, Andreas andreas.dilger at intel.com
Mon Apr 9 18:49:33 PDT 2018


On Apr 9, 2018, at 02:10, Stu Midgley <sdm900 at gmail.com> wrote:
> 
> Afternoon
> 
> We have copied off all the files from an OST (lfs find identifies no files on the OST) but the OST still has some left over files
> 
> eg.
> 
>     9.6G	O/0/d22/1277942
> 
> when I get the FID of this file using zfsobj2fid it appears to get a corrupt FID
> 
>     [0x200000a48:0x1e86e:0x1]
> 
> which then returns
> 
> bad FID format '[0x200000a48:0x1e86e:0x1]', should be [seq:oid:ver] (e.g. [0x200000400:0x2:0x0])
> 
> fid2path: error on FID [0x200000a48:0x1e86e:0x1]: Invalid argument
> 
> when I check it with lfs fid2path

Try it with the last field as 0x0, like "[0x200000a48:0x1e86e:0x0]".
On the OST, we use the last field to store the stripe index for the file,
so that LFSCK can reconstruct the file layout even if the MDT inode is
corrupted.

> WTF?
> 
> Checking a few OST's this isn't isolated.  I've seen a few different corruptions eg.
> 
>     [0x200000a48:0x1e86e:0x7]
>     [0x200000a48:0x1e684:0x3]
>     
> 
> Extra, quite a file files under the O/0/ directory didn't have trusted.fid set... which seemed strange.

That is not unusual, since the parent (MDT inode) FID is only stored into the
object if it is modified by a client, or if an LFSCK layout check is run.

> So a few questions.  
>     How did the FID type get corrupt?
>     How did this file get orphaned?
> 
> I had to modify zfsobj2fid  to work with a mounted snapshot of the ZFS volume
> 
>     # diff ../zfsobj2fid /sbin/zfsobj2fid
> 38c38
> <     p = subprocess.Popen(["zdb", "-O", "-vvv", sys.argv[1], sys.argv[2]],
> ---
> >     p = subprocess.Popen(["zdb", "-e", "-vvv", sys.argv[1], sys.argv[2]],

It would be great if you could submit this as a patch to Gerrit.


Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









More information about the lustre-discuss mailing list