[lustre-discuss] Data migration from one OST to anther
Tung-Han Hsieh
thhsieh at twcp1.phys.ntu.edu.tw
Tue Mar 5 09:33:45 PST 2019
Dear All,
We have found the answer. Starting from Lustre-2.4, the OST will stop
any update actions if we deactive it. Hence during data migration, if
we deactive the OST chome-OST0028_UUID, and copy data out via:
cp -a <file> <file>.tmp
mv <file>.tmp <file>
The "junk" still leaves in chome-OST0028_UUID, unless we restart the
MDT. Restarting MDT will clean out the junks resides the previously
deactived OSTs.
Another way to perform the data migration for chome-OST0028_UUID is:
root at mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/max_create_count
Thus the OST is still active, but just not creating new object. So doing
data migration we can see its space continuously released.
But here we encouter another problem. In our Lustre file system we have
41 OSTs, in which 8 OSTs are full and we want to do data migration. So
we blocked these OSTs from creating new objects. But during the data
migration, suddently the whole Lustre file system hangs, and the MDS
server has a lot of the following dmesg messages:
---------------------------------------
[960570.287161] Lustre: chome-OST001a-osc-MDT0000: slow creates, last=[0x1001a0000:0x3ef241:0x0], next=[0x1001a0000:0x3ef241:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
[960570.287244] Lustre: Skipped 2 previous similar messages
---------------------------------------
where chome-OST001a-osc-MDT0000 is one of the blocked OSTs. It looks like
that MDT still wants to store data into the blocked OSTs. But since they
are blocked, so the whole file system hangs.
Could anyone give us suggestions how to solve it ?
Best Regards,
T.H.Hsieh
On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> We have a problem of data migration from one OST two another.
>
> We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
> on the clients. We want to migrate some data from one OST to another in
> order to re-balance the data occupation among OSTs. In the beginning we
> follow the old method (i.e., method found in Lustre-1.8.X manuals) for
> the data migration. Suppose we have two OSTs:
>
> root at client# /opt/lustre/bin/lfs df
> UUID 1K-blocks Used Available Use% Mounted on
> chome-OST0028_UUID 7692938224 7246709148 55450156 99% /work[OST:40]
> chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42]
>
> and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
> Our procedures are:
>
> 1. We deactivate chome-OST0028_UUID:
> root at mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active
>
> 2. We find all files located in chome-OST0028_UUID:
> root at client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list
>
> 3. In each file listed in the file "list", we did:
>
> cp -a <file> <file>.tmp
> mv <file>.tmp <file>
>
> During the migration, we really saw that more and more data written into
> chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
> In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
> more data coming in, and chome-OST0028_UUID has more and more free space.
>
> It looks like that the data files referenced by MDT have copied to
> chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
> Even though we activate chome-OST0028_UUID after migration, the situation
> is still the same:
>
> root at mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active
>
> Is there any way to cure this problem ?
>
>
> Thanks very much.
>
> T.H.Hsieh
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list