[lustre-discuss] Data migration from one OST to anther

Sun Mar 3 06:09:32 PST 2019

Hsieh,

This sounds similar to a bug with pre-2.5 servers and 2.7 (or newer) clients.  The client and server have a disagreement about which does the delete, and the delete doesn’t happen.  Since you’re running 2.5, I don’t think you should see this, but the symptoms are the same.   You can temporarily fix things by restarting/remounting your OST(s), which will trigger orphan cleanup.  But if that works, the only long term fix is to upgrade your servers to a version that is expected to work with your clients.  (The 2.10 maintenance release is nice if you are not interested in the newest features, otherwise, 2.12 is also an option.)

I would also recommend where possible that you keep clients and servers in sync - we do interop testing, but same version on both is much more widely used.

- Patrick
________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Tung-Han Hsieh <thhsieh at twcp1.phys.ntu.edu.tw>
Sent: Sunday, March 3, 2019 4:00:17 AM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Data migration from one OST to anther

Dear All,

We have a problem of data migration from one OST two another.

We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
on the clients. We want to migrate some data from one OST to another in
order to re-balance the data occupation among OSTs. In the beginning we
follow the old method (i.e., method found in Lustre-1.8.X manuals) for
the data migration. Suppose we have two OSTs:

root at client# /opt/lustre/bin/lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
chome-OST0028_UUID    7692938224  7246709148    55450156  99% /work[OST:40]
chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]

and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
Our procedures are:

1. We deactivate chome-OST0028_UUID:
   root at mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active

2. We find all files located in chome-OST0028_UUID:
   root at client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list

3. In each file listed in the file "list", we did:

        cp -a <file> <file>.tmp
        mv <file>.tmp <file>

During the migration, we really saw that more and more data written into
chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
more data coming in, and chome-OST0028_UUID has more and more free space.

It looks like that the data files referenced by MDT have copied to
chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
Even though we activate chome-OST0028_UUID after migration, the situation
is still the same:

root at mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active

Is there any way to cure this problem ?

Thanks very much.

T.H.Hsieh
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190303/f987ba39/attachment.html>