[Lustre-discuss] How to remove OST permanently?

D. Dante Lorenso dante at lorenso.com
Fri Nov 23 14:36:44 PST 2007


Brian J. Murrell wrote:
> On Fri, 2007-11-23 at 12:01 -0600, D. Dante Lorenso wrote:
>> I've added a new 2.2 TB OST to my cluster easily enough, but this new 
>> disk array is meant to replace several smaller OSTs that I used to have 
>> of which were only 120 GB, 500 GB, and 700 GB.
>>
>> Adding an OST is easy, but how do I REMOVE the small OSTs that I no 
>> longer want to be part of my cluster?  Is there a command to tell luster 
>> to move all the file stripes off one of the nodes?
> 
> I answered a very similar question not that long ago (i.e. last couple
> of weeks) on this list.  Check the archives linked off the main list
> page listed in the footer of this message.

Sorry about that.  I just joined the list and didn't see your other 
message here:

https://mail.clusterfs.com/pipermail/lustre-discuss/2007-November/004332.html

You recommend to put the OST in read-only mode then hunt down all the 
files on the OST and copy/move them to other OSTs.  Here is my attempt 
to do just that:

----------

I set up a small lab test with a couple 20 GB OSTs and created some 
large files on it using:

   [lab01]/dante> dd if=/dev/zero of=/dante/BIG_001.out bs=50MB count=1
   [lab01]/dante> dd if=/dev/zero of=/dante/BIG_002.out bs=50MB count=1
   ...
   [lab01]/dante> dd if=/dev/zero of=/dante/BIG_020.out bs=50MB count=1

The 'lfs find' command needs to know the UUID of the OST which you are 
looking up.  I found that ID by using this command:

   [lab01]/dante> lfs df -h
   UUID                     bytes      Used Available  Use% Mounted on
   dante-MDT0000_UUID       16.3G      1.4G     14.9G    8% /dante[MDT:0]
   dante-OST0000_UUID       19.7G      1.5G     18.1G    7% /dante[OST:0]
   dante-OST0001_UUID       19.7G      1.5G     18.2G    7% /dante[OST:1]

   filesystem summary:      39.4G      3.1G     36.3G    7% /dante

Once I know the UUID, I can use the lfs find command to find all the 
files which have objects stored on a specific OST:

   [lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
   /dante/BIG_002.out
   /dante/BIG_004.out
   /dante/BIG_006.out
   /dante/BIG_008.out
   /dante/BIG_010.out
   /dante/BIG_012.out
   /dante/BIG_014.out
   /dante/BIG_015.out
   /dante/BIG_017.out

So, I think these files need to be copied and moved after making the OST 
readonly.  I can't figure out how to use the 'readonly' command as 
listed in the lctl command:

   [lab01]/dante> lctl help | less
   Available commands are:
   ...
           === testing (DANGEROUS) ==
   ...
           readonly
   ...
   For more help type: help command-name

Apparently, Brian was having trouble figuring it out also:
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-October/002246.html

So, to make an OST readonly I need to deactivate it?  I need to know the 
device number to do that.  Here's how I find the device number from the MDS:

[lab01]/dante> lctl device_list
   0 UP mgs MGS MGS 9
   1 UP mgc MGC192.168.200.51 at tcp 690382da-9a5b-c548-a042-29b3c805494d  5
   2 UP mdt MDS MDS_uuid 3
   3 UP lov dante-mdtlov dante-mdtlov_UUID 4
   4 UP mds dante-MDT0000 dante-MDT0000_UUID 9
   5 UP osc dante-OST0000-osc dante-mdtlov_UUID 5
   6 UP osc dante-OST0001-osc dante-mdtlov_UUID 5
   7 UP lov dante-clilov-d87aba00 ...202d-b193-5bdc-077b7760b396 4
   8 UP mdc dante-MDT0000-mdc-d87aba00 .....2d-b193-5bdc-077b7760b396 5
   9 UP osc dante-OST0000-osc-d87aba00 ...202d-b193-5bdc-077b7760b396 5
  10 UP osc dante-OST0001-osc-d87aba00 ...202d-b193-5bdc-077b7760b396 5

If I'm trying to remove OST0001, then I need to deactivate device #6 or 
is it #10.  I'll try #6 and see what happens:

   [lab01]/dante> lctl --device 6 deactivate

Ok, did that work?  I didn't get any output:

   [lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
   0: dante-OST0000_UUID ACTIVE
   1: dante-OST0001_UUID INACTIVE

Well, it's now listed as INACTIVE, so I guess it must have worked?  Can 
I activate it again?

   [lab01]/dante> lctl --device 6 activate
   [lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
   0: dante-OST0000_UUID ACTIVE
   1: dante-OST0001_UUID ACTIVE

Ok, I guess that's how you do it, so here I go making it deactivated again:

   [lab01]/dante> lctl --device 6 deactivate
   [lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
   0: dante-OST0000_UUID ACTIVE
   1: dante-OST0001_UUID INACTIVE

Now, back to that list of files that I need to move:

   [lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
   /dante/BIG_002.out
   /dante/BIG_004.out
   /dante/BIG_006.out
   /dante/BIG_008.out
   /dante/BIG_010.out
   /dante/BIG_012.out
   /dante/BIG_014.out
   /dante/BIG_015.out
   /dante/BIG_017.out

Let try moving one of the files OFF this device:

   [lab01]/dante> copy BIG_002.out BIG_002.out.tmp
   [lab01]/dante> mv BIG_002.out.tmp BIG_002.out
   [lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
   /dante/BIG_004.out
   /dante/BIG_006.out
   /dante/BIG_008.out
   /dante/BIG_010.out
   /dante/BIG_012.out
   /dante/BIG_014.out
   /dante/BIG_015.out
   /dante/BIG_017.out

I guess that's working.  Let me keep going to see if I can get them all 
moved over:

   [lab01]/dante> copy ... mv ... copy ... mv ... etc etc
   [lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
   * nothing listed *

Well, looks good so far.  How about that df command?:

  [lab01]/dante> lfs df -h /dante
   UUID                     bytes      Used Available  Use% Mounted on
   dante-MDT0000_UUID       16.3G      1.4G     14.9G    8% /dante[MDT:0]
   dante-OST0000_UUID       19.7G      1.6G     18.1G    8% /dante[OST:0]
   dante-OST0001_UUID       19.7G      1.4G     18.3G    7% /dante[OST:1]
   filesystem summary:      39.4G      3.1G     36.3G    7% /dante

What?  If there is nothing on dante-OST0001_UUID, then why does it still 
show 1.4G used?  Let's confirm that the files are all on OST0000 and not 
OST0001:

   [lab01]/dante> lfs find -r --obd dante-OST0000_UUID /dante
   /dante/BIG_001.out
   /dante/BIG_016.out
   /dante/BIG_002.out
   /dante/BIG_003.out
   /dante/BIG_004.out
   /dante/BIG_005.out
   /dante/BIG_006.out
   /dante/BIG_007.out
   /dante/BIG_008.out
   /dante/BIG_009.out
   /dante/BIG_010.out
   /dante/BIG_011.out
   /dante/BIG_012.out
   /dante/BIG_013.out
   /dante/BIG_014.out
   /dante/BIG_015.out
   /dante/BIG_017.out
   /dante/BIG_018.out
   /dante/BIG_019.out
   /dante/BIG_020.out
   [lab01]/dante> lfs find -r --obd dante-OST0001_UUID /dante
   [lab01]/dante> lfs df -h /dante
   UUID                     bytes      Used Available  Use% Mounted on
   dante-MDT0000_UUID       16.3G      1.4G     14.9G    8% /dante[MDT:0]
   dante-OST0000_UUID       19.7G      1.6G     18.1G    8% /dante[OST:0]
   dante-OST0001_UUID       19.7G      1.4G     18.3G    7% /dante[OST:1]
   filesystem summary:      39.4G      3.1G     36.3G    7% /dante

Well, sure enough it looks like all 20 files are on OST0000 and no files 
are on OST0001 yet the df output doesn't reflect that.  Maybe the df 
output isn't correct unless I activate the OST again?  Let's try that:

   [lab01]/dante> lctl --device 6 activate
   [lab01]/dante> cat /proc/fs/lustre/lov/dante-mdtlov/target_obd
   0: dante-OST0000_UUID ACTIVE
   1: dante-OST0001_UUID ACTIVE
   [lab01]/dante> lfs df -h /dante
   UUID                     bytes      Used Available  Use% Mounted on
   dante-MDT0000_UUID       16.3G      1.4G     14.9G    8% /dante[MDT:0]
   dante-OST0000_UUID       19.7G      1.6G     18.1G    8% /dante[OST:0]
   dante-OST0001_UUID       19.7G      1.4G     18.3G    7% /dante[OST:1]
   filesystem summary:      39.4G      3.1G     36.3G    7% /dante

Nope, that doesn't work either.  What's wrong with this picture?

-- Dante




More information about the lustre-discuss mailing list