[lustre-discuss] space is not released when removing files using zfs- and Lustre 2.8.0

Crowe, Tom thcrowe at iu.edu
Thu Sep 15 18:18:28 PDT 2016

Hi Andreas,

The one "broken object" that we removed as a test, did live in O/0/d*. This was the only file we removed while having a clone mounted as "native" ZFS (changing ZFS property "canmount=yes"). We had took a snapshot followed by ZFS send/receive to another zpool to accomplish this test.

We are planning to upgrade in test, to ZFS 0.6.5.x to see if the behavior can be replicated, but more so if the "broken objects" will get cleaned up on export/import of the zpool.

We have used snapshots on this filesystem in the past, but at the time we issued the rm's there were none.

Sent from my iPhone

On Sep 15, 2016, at 18:49, Dilger, Andreas <andreas.dilger at intel.com<mailto:andreas.dilger at intel.com>> wrote:

Note that the "broken objects" may be Lustre internal metadata, such as the Object Index files, so deleting them may be bad for your filesystem without knowing what they are.

Not that I can say for sure your bug is fixed, but ZFS 0.6.4 is getting a bit long in the tooth (tagged April 8, 2015), and 0.6.5.x is the current maintenance version that is getting a fair number of fixes.

Any OST objects that were used by regular files are stored in O/0/d* on the OSTs.

The other important question to ask is if you have any snapshots on the filesystem?  That would pin data blocks in the pool, and deleting the files would not release any space until after the snapshot(s) are removed.

Cheers, Andreas
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division

On 2016/09/15, 15:23, "Crowe, Tom" <thcrowe at iu.edu<mailto:thcrowe at iu.edu>> wrote:

Hi Jinshan,

The examples in the first part of the thread are from one of our OST's. We had all previous files/dirs pinned" to the OST via setstripe; so there was a specific top level directory associated with this specific OST. After the recursive rm, we noticed the ZFS OST, still showed space allocated, while lustre shows the directory as empty. Thus we dug deeper with zdb.

So to summarize, after removing all files, lustre showed nothing on the "setstripe pinned" directory, where all files lived previous to the rm. But the ZFS based OST showed the "broken objects" and the space in the ZFS vdev was still allocated.

It is worth noting, we use a ldiskfs based MDT, and all OST's are ZFS based.

Thank you for your response. Please let me know if I can provide any additional data.


On Sep 15, 2016, at 16:55, Xiong, Jinshan <jinshan.xiong at intel.com<mailto:jinshan.xiong at intel.com>> wrote:
Hi Tom,

Just to narrow down the problem, when you saw the space was not freed from zpool, were you seeing this from MDT or OST zpool?

It seems that the objects you dumped were from MDT pool. The object 138 should belong to a Lustre file, and it has a spilled block attached.


On Sep 9, 2016, at 1:34 PM, Crowe, Tom <thcrowe at iu.edu<mailto:thcrowe at iu.edu>> wrote:

Greetings All,

I have come across a strange scenario using zfs and Lustre 2.8.0.

In a nutshell, when we delete items from lustre using rm, the files/dirs are seemingly removed, but the space is not freed on the underlying zfs dataset/zpool. We have unmounted the OST/dataset, exported the zpool, and even rebooted the server altogether. The space is never freed.

I have read through many of the issues logged about this on https://github.com/zfsonlinux; many folks have reclaimed the space once they unmount/remount, and/or export/import. As noted above, this has no impact on our dataset/zpool.

We currently have a zpool scrub running, and expect this to complete in the next few hours.

We DO have zfs compression enabled, and we are using ZFS quota and reservations for the associated OST/dataset.

We have copied the dataset to an entirely different zpool (zfs send/receive), and then mounted as native ZFS to poke around. In doing so, we located some of the “broken path” files that were of decent size (4GB) and went a head and removed them with rm. These files were/are ones that we see with the ??? in the path output from zdb. After removing the files, the space was almost immediately freed from the dataset/zpool.

Here is an example of a directory and a file from the zdb output.

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
       136    2    16K    16K  9.00K    32K  100.00  ZFS directory
                                        144   bonus  System attributes
        dnode maxblkid: 1
        path    ???<object#136>
        uid     0
        gid     0
        atime   Wed Dec 31 19:00:00 1969
        mtime   Wed Dec 31 19:00:00 1969
        ctime   Wed Dec 31 19:00:00 1969
        crtime  Thu Mar 10 12:04:52 2016
        gen     1020298
        mode    40755
        size    2
        parent  1
        links   1
        pflags  0
        rdev    0x0000000000000000
        Fat ZAP stats:
                Pointer table:
                        1024 elements
                        zt_blk: 0
                        zt_numblks: 0
                        zt_shift: 10
                        zt_blks_copied: 0
                        zt_nextblk: 0
                ZAP entries: 2
                Leaf blocks: 1
                Total blocks: 2
                zap_block_type: 0x8000000000000001
                zap_magic: 0x2f52ab2ab
                zap_salt: 0x3fdbcd9ab9
                Leafs with 2^n pointers:
                          9:      1 *
                Blocks with n*5 entries:
                          0:      1 *
                Blocks n/10 full:
                          1:      1 *
                Entries with n chunks:
                          3:      2 **
                Buckets with n entries:
                          0:    510 ****************************************
                          1:      2 *

                0 = 38711 (type: not specified)
                feb93 = 281474976687710 (type: 15 (invalid))

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
       138    1    16K   128K   128K   128K  100.00  ZFS plain file
                                        220   bonus  System attributes
        dnode maxblkid: 0
        path    ???<object#138>
        uid     0
        gid     0
        atime   Wed Dec 31 19:00:00 1969
        mtime   Wed Dec 31 19:00:00 1969
        ctime   Wed Dec 31 19:00:00 1969
        crtime  Thu Mar 10 12:04:52 2016
        gen     1020298
        mode    100644
        size    8
        parent  0
        links   1
        pflags  0
        rdev    0x0000000000000000
        SA xattrs: 76 bytes, 1 entries

                trusted.lma = \000\000\000\000\000\000\000\000\003\000\000\000\002\000\000\000\000\000\000\000\000\000\000\000

Can anyone advise on some next steps troubleshooting, or previous experiences that are similar to ours?


lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160916/d0052f1e/attachment-0001.htm>

More information about the lustre-discuss mailing list