[Lustre-discuss] lctl --device XX deactivate doesn't make OST read only

Alastair Ferguson aferguson at cmcrc.com
Tue Jun 18 16:14:45 PDT 2013


Sorry - final update.

It appears that two osts are both still at 100% (don't know how I got that wrong) and 40Mb space.

I tried:

 lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y

Now getting this:

/data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open '/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2': No such file or directory (2)
error: find failed for +20G.
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28)


Also doing:

lfs_migrate /data/workflow

(8TB in size)

& 

lfs_migrate /data/raw

(15TB)

and still:

AC3-OST000a_UUID           14.3T       13.6T       46.1M 100% /data[OST:10]
AC3-OST0010_UUID            7.2T        6.8T       46.1M 100% /data[OST:16]

We can't run our processes because of the no space on device errors. Help!

Alastair Ferguson
IT Manager
Capital Markets CRC Limited (CMCRC)
Telephone: +61 2 8088 4222
Mobile: +61 424 235 159
Fax: +61 2 8088 4201
www.cmcrc.com 



Capital Markets CRC Ltd - Confidential Communication
The information contained in this e-mail is confidential.  It is intended for the addressee only.  If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates.  If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.


On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger at intel.com> wrote:

> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson at cmcrc.com> wrote:
>> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free
>> space around the other OSTs, so I do:
>> 
>> lfs df -h
>> 
>> Get this part as one of the OSTs I need to deactivate:
>> 
>> AC3-OST000c_UUID           14.3T       13.6T       87.4M 100%
>> /data[OST:12]
>> 
>> then
>> 
>> lctl dl
>> 
>> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>> 
>> Then
>> 
>> lctl --device 19 deactivate
>> 
>> then
>> 
>> lctl dl:
>> 
>> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>> 
>> Should be read only right>>?
> 
> Right, this is the MDS OSC device, so no new files should be allocated on
> that OST.
> 
>> Then
>> 
>> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >>
>> ost000c_raw.txt
>> 
>> To find the files in the filesystem (/data) and strip out all the stuff
>> you don't need.  Then:
>> 
>> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv
>> "$line.___bak" "$line";  done < ost000c_raw.txt
>> 
>> This should move the data off the OST but it doesn't. I have used this
>> procedure before to remove data from a whole server (which worked) and I
>> can see when I  lfs df -h
>> the ost emptying but in this case it goes up and down suggesting it is
>> copying BACK to the same OST despite the fact it is IN not UP when lctl
>> dl is run.
> 
> You should look at "lfs_migrate" and its man page, for a more robust
> mechanism for
> doing the above migration.  Your script is unsafe if interrupted after "rm
> -f" but
> before "mv" moves the old file into place.  You can also use "lfs_migrate"
> in a
> pipeline, so that it only moves new files, while your script would re-move
> the same
> files repeatedly if interrupted and restarted.
> 
>> How can I get files off this as I get errors saying no space on device??
> 
> Your process _should_ be working, but if you are moving small files the
> effects may
> be slow.  As mentioned in the "lfs_migrate" man page, you should select
> large files
> to migrate, since you will get better IO performance, and will free space
> more quickly.
> 
> Cheers, Andreas
> -- 
> Andreas Dilger
> 
> Lustre Software Architect
> Intel High Performance Data Division
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130619/b81ad7d4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 13489 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130619/b81ad7d4/attachment.png>


More information about the lustre-discuss mailing list