[lustre-discuss] Data migration from one OST to anther

Tue Mar 5 09:33:45 PST 2019

Dear All,

We have found the answer. Starting from Lustre-2.4, the OST will stop
any update actions if we deactive it. Hence during data migration, if
we deactive the OST chome-OST0028_UUID, and copy data out via:

 	cp -a <file> <file>.tmp
 	mv <file>.tmp <file>

The "junk" still leaves in chome-OST0028_UUID, unless we restart the
MDT. Restarting MDT will clean out the junks resides the previously
deactived OSTs.

Another way to perform the data migration for chome-OST0028_UUID is:

root at mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/max_create_count

Thus the OST is still active, but just not creating new object. So doing
data migration we can see its space continuously released.

But here we encouter another problem. In our Lustre file system we have
41 OSTs, in which 8 OSTs are full and we want to do data migration. So
we blocked these OSTs from creating new objects. But during the data
migration, suddently the whole Lustre file system hangs, and the MDS
server has a lot of the following dmesg messages:

---------------------------------------
[960570.287161] Lustre: chome-OST001a-osc-MDT0000: slow creates, last=[0x1001a0000:0x3ef241:0x0], next=[0x1001a0000:0x3ef241:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
[960570.287244] Lustre: Skipped 2 previous similar messages
---------------------------------------

where chome-OST001a-osc-MDT0000 is one of the blocked OSTs. It looks like
that MDT still wants to store data into the blocked OSTs. But since they
are blocked, so the whole file system hangs.

Could anyone give us suggestions how to solve it ?

Best Regards,

T.H.Hsieh

On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote:
> Dear All,
> 
> We have a problem of data migration from one OST two another.
> 
> We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
> on the clients. We want to migrate some data from one OST to another in
> order to re-balance the data occupation among OSTs. In the beginning we
> follow the old method (i.e., method found in Lustre-1.8.X manuals) for
> the data migration. Suppose we have two OSTs:
> 
> root at client# /opt/lustre/bin/lfs df
> UUID                   1K-blocks        Used   Available Use% Mounted on
> chome-OST0028_UUID    7692938224  7246709148    55450156  99% /work[OST:40]
> chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]
> 
> and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
> Our procedures are:
> 
> 1. We deactivate chome-OST0028_UUID:
>    root at mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active
> 
> 2. We find all files located in chome-OST0028_UUID:
>    root at client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list
> 
> 3. In each file listed in the file "list", we did:
> 
> 	cp -a <file> <file>.tmp
> 	mv <file>.tmp <file>
> 
> During the migration, we really saw that more and more data written into
> chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
> In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
> more data coming in, and chome-OST0028_UUID has more and more free space.
> 
> It looks like that the data files referenced by MDT have copied to
> chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
> Even though we activate chome-OST0028_UUID after migration, the situation
> is still the same:
> 
> root at mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active
> 
> Is there any way to cure this problem ?
> 
> 
> Thanks very much.
> 
> T.H.Hsieh
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org