[lustre-discuss] lustre 2.5.3 ost not draining

Kurt Strosahl strosahl at jlab.org
Tue Jul 14 05:23:04 PDT 2015


Looks like the xattr is already set to sa, on the fullest OSTs this would have been done after the move to 0.6.3.1, so we should be OK in that regard.

# zfs get all | grep sa
lustre-ost12/ost12  xattr                 sa                                   local
lustre-ost13/ost13  xattr                 sa                                   local
lustre-ost14/ost14  xattr                 sa                                   local

I didn't change the ashift value, as I'd experimented with during the development stage and didn't see any performance increase.

Is there an easy way to show fragmentation?

w/r,
Kurt

----- Original Message -----
From: "aik" <aik at fnal.gov>
To: "Kurt Strosahl" <strosahl at jlab.org>
Cc: "aik" <aik at fnal.gov>, "Sean Brisbane" <sean.brisbane at physics.ox.ac.uk>, lustre-discuss at lists.lustre.org
Sent: Monday, July 13, 2015 8:20:31 PM
Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining

Hi Kurt,

The situation with "mount/unmount is necessary to trigger the cleanup" is similar to described at zfs bug 1548:
    https://github.com/zfsonlinux/zfs/issues/1548
Reportedly it was fixed in zfs 0.6.3 ; the update to 0.6.4.1 is recommended;  and  0.6.4.2 was recently released.
The bug is related to xattr reference count and cleanup; xattr=sa setting is recommended. But: it is effective for the new files. Once you created the file, xattr type stays.

At the ticket, the one of the failure scenarios refers to the case when the space is not released after unmount/mount. Search for  "None of objects X, X1 nor X2 are freed" on bug #1548 webpage. I'm afraid you will need to transfer data from ost pool and reformat ost.

The entry on Jan 6 on issue 1548 suggests to drop vm caches. 

You may want to check zfs version in use, xattr setting for zpool, zfs (xattr=sa). What version of zfs was in use when you wrote files you can not delete now?
What is ashift and reported fragmentation on the zpool/zfs?

Best regards,
Alex.

On Jul 12, 2015, at 4:13 AM, Sean Brisbane <sean.brisbane at physics.ox.ac.uk> wrote:

> Hi Kurt,
> 
> I was following the recommendation that the OST be active to allow the deletion to happen, hence the reactivation followed by mount/unmount is necessary to trigger the cleanup. The OST was therefore active during the mount/unmount.  
> 
> Best,
> Sean
> ________________________________________
> From: Kurt Strosahl [strosahl at jlab.org]
> Sent: 12 July 2015 02:03
> To: Sean Brisbane
> Cc: Shawn Hall; lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Thanks,
> 
>   I'll have to see if I can run this test myself.  Did you notice if the "inactive" status persisted through the unmount/remount?
> 
> w/r,
> Kurt
> 
> ----- Original Message -----
> From: "Sean Brisbane" <sean.brisbane at physics.ox.ac.uk>
> To: "Kurt Strosahl" <strosahl at jlab.org>, "Shawn Hall" <shawn.hall at nag.com>
> Cc: lustre-discuss at lists.lustre.org
> Sent: Saturday, July 11, 2015 4:29:42 AM
> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Dear Kurt,
> 
> I have the same issue as you in that deleted files on deactivated OST could not be cleaned up even after re-activation. It was on my todo list to work out at some point how to get around this. I was told that an unmount/mount cycle on the servers will trigger a clean-up.
> 
> I have just performed the experiment and it was in fact the MDT not the OST which needed to be unmounted and re-mounted in my case.
> 
> Unmounting and remounting the OST during this process appeared to make no difference either way.
> 
> All the best,
> Sean
> 
> 
> ________________________________________
> From: Kurt Strosahl [strosahl at jlab.org]
> Sent: 10 July 2015 19:53
> To: Shawn Hall
> Cc: Sean Brisbane; lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Yes, there are quite a few issues with lustre 2.5.3 (it would be sad if it wasn't so frustrating... 1.8.x was solid).
> 
> The full osts have a higher index then the one that broke the weighted round robin... plus all the ones above the most recent are exceptionally full (>=80%).  I'm not sure how I'm going to go forward, I've heard that maybe an unmount / mount of the osts would push a purge. I'm also compiling a list of all the files on the ost... the idea being that I could then enable it, and launch multiple lfs_migrates... trying to race everyone else using the file system.  I think I'd have the advantage, as my moves would be targeted directly to the ost, while the other writes would just land where ever they could.
> 
> w/r,
> Kurt
> 
> ----- Original Message -----
> From: "Shawn Hall" <shawn.hall at nag.com>
> To: "Kurt Strosahl" <strosahl at jlab.org>, "Sean Brisbane" <sean.brisbane at physics.ox.ac.uk>
> Cc: lustre-discuss at lists.lustre.org
> Sent: Friday, July 10, 2015 11:49:06 AM
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> It sounds like you have a couple of issues that are working against each other then.  You’ll probably need to fight one at a time.
> 
> 
> 
> My recommendation of clearing up file system space still stands.  I don’t have scientific proof, but giving Lustre more space to work with definitely helps.
> 
> Does your full OST have a lower index than your slow OST?  Then you could disable the slow one (and because of the bug everything above it) and let space clear up on the full one.
> 
> Beyond that you might have to get creative and try something similar to Tommy.  Migrate data but manually specify stripe offsets.
> 
> Shawn
> 
> On 7/10/15, 11:13 AM, "lustre-discuss on behalf of Kurt Strosahl" <lustre-discuss-bounces at lists.lustre.org on behalf of strosahl at jlab.org> wrote:
> 
>> No, I'm aware of why the ost is getting new writes... it is because I had to set the qos_threshold_rr to 100 due to https://jira.hpdd.intel.com/browse/LU-5778  (I have an ost that has to be ignored due to terrible write performance...)
>> 
>> w/r,
>> Kurt
>> 
>> ----- Original Message -----
>> From: "Sean Brisbane" <sean.brisbane at physics.ox.ac.uk>
>> To: "Kurt Strosahl" <strosahl at jlab.org>
>> Cc: "Patrick Farrell" <paf at cray.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
>> Sent: Friday, July 10, 2015 11:04:27 AM
>> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> Dear Kurt,
>> 
>> Apologies.  After leaving it some number of days it did *not* clean itself up, but I feel that some number of days is long enough to verify that it is a problem.
>> 
>> Sounds like you have another issue if the OST is not being marked as full and writes are not being re-allocated to other OSTS .  I also have that second issue on my system as well and I have only workarounds to offer you for the problem.
>> 
>> Thanks,
>> Sean
>> 
>> -----Original Message-----
>> From: Kurt Strosahl [mailto:strosahl at jlab.org]
>> Sent: 10 July 2015 16:01
>> To: Sean Brisbane
>> Cc: Patrick Farrell; lustre-discuss at lists.lustre.org
>> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> The problem there is that I cannot afford to leave it "some number of days"... it is at 97% full, so new writes are going to it faster then it can clean itself off.
>> 
>> w/r,
>> Kurt
>> 
>> ----- Original Message -----
>> From: "Sean Brisbane" <sean.brisbane at physics.ox.ac.uk>
>> To: "Patrick Farrell" <paf at cray.com>, "Kurt Strosahl" <strosahl at jlab.org>
>> Cc: lustre-discuss at lists.lustre.org
>> Sent: Friday, July 10, 2015 10:44:39 AM
>> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> Hi,
>> 
>> The 'space not freed' issue also happened to me and I left it 'some number of days'  I don't recall how many, it was a while back.
>> 
>> Cheers,
>> Sean
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list