[Lustre-discuss] lctl --device XX deactivate doesn't make OST read only
aferguson at cmcrc.com
Tue Jun 18 16:14:45 PDT 2013
Sorry - final update.
It appears that two osts are both still at 100% (don't know how I got that wrong) and 40Mb space.
lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y
Now getting this:
/data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open '/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2': No such file or directory (2)
error: find failed for +20G.
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28)
(8TB in size)
AC3-OST000a_UUID 14.3T 13.6T 46.1M 100% /data[OST:10]
AC3-OST0010_UUID 7.2T 6.8T 46.1M 100% /data[OST:16]
We can't run our processes because of the no space on device errors. Help!
Capital Markets CRC Limited (CMCRC)
Telephone: +61 2 8088 4222
Mobile: +61 424 235 159
Fax: +61 2 8088 4201
Capital Markets CRC Ltd - Confidential Communication
The information contained in this e-mail is confidential. It is intended for the addressee only. If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates. If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.
On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger at intel.com> wrote:
> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson at cmcrc.com> wrote:
>> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free
>> space around the other OSTs, so I do:
>> lfs df -h
>> Get this part as one of the OSTs I need to deactivate:
>> AC3-OST000c_UUID 14.3T 13.6T 87.4M 100%
>> lctl dl
>> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>> lctl --device 19 deactivate
>> lctl dl:
>> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>> Should be read only right>>?
> Right, this is the MDS OSC device, so no new files should be allocated on
> that OST.
>> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >>
>> To find the files in the filesystem (/data) and strip out all the stuff
>> you don't need. Then:
>> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv
>> "$line.___bak" "$line"; done < ost000c_raw.txt
>> This should move the data off the OST but it doesn't. I have used this
>> procedure before to remove data from a whole server (which worked) and I
>> can see when I lfs df -h
>> the ost emptying but in this case it goes up and down suggesting it is
>> copying BACK to the same OST despite the fact it is IN not UP when lctl
>> dl is run.
> You should look at "lfs_migrate" and its man page, for a more robust
> mechanism for
> doing the above migration. Your script is unsafe if interrupted after "rm
> -f" but
> before "mv" moves the old file into place. You can also use "lfs_migrate"
> in a
> pipeline, so that it only moves new files, while your script would re-move
> the same
> files repeatedly if interrupted and restarted.
>> How can I get files off this as I get errors saying no space on device??
> Your process _should_ be working, but if you are moving small files the
> effects may
> be slow. As mentioned in the "lfs_migrate" man page, you should select
> large files
> to migrate, since you will get better IO performance, and will free space
> more quickly.
> Cheers, Andreas
> Andreas Dilger
> Lustre Software Architect
> Intel High Performance Data Division
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 13489 bytes
Desc: not available
More information about the lustre-discuss