[Lustre-discuss] Fwd: lctl --device XX deactivate doesn't make OST read only
aferguson at cmcrc.com
Tue Jun 18 18:41:30 PDT 2013
Update - lfs df -h is not working correctly.
It said I had 44M free 110% so I did:
lfs find /data -O AC3-OST0010_UUID -size +20G
Then it found /data/smarts/ksc_mq/am/03456.am
so I did:
cp -vp /data/smarts/ksc_mq/am/03456.am /data/smarts/ksc_mq/am/03456.am.bkp
Then when it had finished:
rm -f /data/smarts/ksc_mq/am/03456.am
mv /data/smarts/ksc_mq/am/03456.am.bkp /data/smarts/ksc_mq/am/03456.am
This file was 359GB therefore, lfs df -h HAS TO BE wrong.
How can I make it right?
Capital Markets CRC Limited (CMCRC)
Telephone: +61 2 8088 4222
Mobile: +61 424 235 159
Fax: +61 2 8088 4201
Capital Markets CRC Ltd - Confidential Communication
The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.
Begin forwarded message:
> From: Alastair Ferguson <aferguson at cmcrc.com>
> Subject: Re: [Lustre-discuss] lctl --device XX deactivate doesn't make OST read only
> Date: 19 June 2013 9:14:45 AM AEST
> To: Andreas Dilger <andreas.dilger at intel.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> Sorry - final update.
> It appears that two osts are both still at 100% (don't know how I got that wrong) and 40Mb space.
> I tried:
> lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y
> Now getting this:
> /data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open '/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2': No such file or directory (2)
> error: find failed for +20G.
> rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
> rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28)
> Also doing:
> lfs_migrate /data/workflow
> (8TB in size)
> lfs_migrate /data/raw
> and still:
> AC3-OST000a_UUID 14.3T 13.6T 46.1M 100% /data[OST:10]
> AC3-OST0010_UUID 7.2T 6.8T 46.1M 100% /data[OST:16]
> We can't run our processes because of the no space on device errors. Help!
> Alastair Ferguson
> IT Manager
> Capital Markets CRC Limited (CMCRC)
> Telephone: +61 2 8088 4222
> Mobile: +61 424 235 159
> Fax: +61 2 8088 4201
> Capital Markets CRC Ltd - Confidential Communication
> The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.
> On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger at intel.com> wrote:
>> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson at cmcrc.com> wrote:
>>> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free
>>> space around the other OSTs, so I do:
>>> lfs df -h
>>> Get this part as one of the OSTs I need to deactivate:
>>> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100%
>>> lctl dl
>>> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>>> lctl --device 19 deactivate
>>> lctl dl:
>>> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>>> Should be read only right>>?
>> Right, this is the MDS OSC device, so no new files should be allocated on
>> that OST.
>>> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >>
>>> To find the files in the filesystem (/data) and strip out all the stuff
>>> you don't need. Then:
>>> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv
>>> "$line.___bak" "$line"; done < ost000c_raw.txt
>>> This should move the data off the OST but it doesn't. I have used this
>>> procedure before to remove data from a whole server (which worked) and I
>>> can see when I lfs df -h
>>> the ost emptying but in this case it goes up and down suggesting it is
>>> copying BACK to the same OST despite the fact it is IN not UP when lctl
>>> dl is run.
>> You should look at "lfs_migrate" and its man page, for a more robust
>> mechanism for
>> doing the above migration. Your script is unsafe if interrupted after "rm
>> -f" but
>> before "mv" moves the old file into place. You can also use "lfs_migrate"
>> in a
>> pipeline, so that it only moves new files, while your script would re-move
>> the same
>> files repeatedly if interrupted and restarted.
>>> How can I get files off this as I get errors saying no space on device??
>> Your process _should_ be working, but if you are moving small files the
>> effects may
>> be slow. As mentioned in the "lfs_migrate" man page, you should select
>> large files
>> to migrate, since you will get better IO performance, and will free space
>> more quickly.
>> Cheers, Andreas
>> Andreas Dilger
>> Lustre Software Architect
>> Intel High Performance Data Division
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 13489 bytes
Desc: not available
More information about the lustre-discuss