[lustre-discuss] Cannot move data after upgrading to Lustre 2.12.6

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Thu Feb 25 22:12:50 PST 2021


Dear Cory,

Thank you very much for your reply. And sorry for my delayed report,
because these days we dit some tests and try to narrow down the problem.

I am not sure whether the "mv" problem we found is related to the
bug report:

https://jira.whamcloud.com/browse/LU-13392

or not. But these days we tried running LFSCK, which helped a lot.
Here I report our test results:

1. Create Lustre file system 1.8.8 (ldiskfs based), and store some
   data trees in it.

2. Upgrade to Lustre file system 2.12.6 (still ldiskfs based).

3. Enter the Lustre file system mount point (in the client side),
   running:

	mv dir1/file dir2/

   failed with

	mv: cannot move 'dir1/file' to 'dir2/file': No data available

4. Running LFSCK by the following two ways:

   - LFSCK runs for all MDT and OSTs:
     lctl lfsck_start -A

   - LFSCK runs for only the MDT:
     lctl lfsck_start -M <MDT-start>

   No matter which way, the above "mv" problem is fixed. In addition,
   after running LFSCK for all MDT and OSTs, we checked the final
   reports of LFSCK via:

   - In MDT server:
	lctl get_param -n mdd.*.lfsck_namespace
	lctl get_param -n mdd.*.lfsck_layout

   - In OST server:
	lctl get_param -n obdfilter.*.lfsck_layout

   and saw that only MDT have fixed records (in failed_phase1, dirent_repaired,
   and linkea_repaired). OSTs do not. So we conclude that the "mv" problem
   only occures in MDT. As a result, running LFSCK at least for MDT devices
   would be an important SOP after upgrading Lustre file system.

5. However, there is one "mv" problem still remaining. Suppose in the
   client side the Lustre file system is mounted at /lustre, then

	mv /lustre/file /lustre/dir/

   still failed. That is, we still cannot move the file under the "ROOT"
   of the Lustre file system to other sub-directories. It seems that the
   fixing in LFSCK was overlooked the Lustre "ROOT" directory itself.

So far we still do not find any way to fix the final problem mentioned
in 5. Any idea is very welcome. If there is no further input, we will
probably report this as a bug to the Jira Lustre Bug tracking website.

Cheers,

T.H.Hsieh


On Mon, Feb 22, 2021 at 01:57:37PM +0000, Spitz, Cory James wrote:
> Hello, T.H.Hsieh.
> 
> Your report sounds familiar to me.  Although you are concerned about upgrades from 1.8.x, there were some other troubles reported when updating from earlier 2.x.  You might want to take a closer look at https://jira.whamcloud.com/browse/LU-13392.  I didn’t review it deeply and maybe it isn’t even closely related to your trouble, but you may find it helpful.  In any case since you seem so willing to experiment, I’m curious what happens if you run LFSCK.  LFSCK ought to be able to add and check FID-in-dirent and linkEA entries, both of which won’t exist in a 1.8.x filesystem.  I think Xyratex even released an upgrade tool to make these sorts of updates prior to mounting under 2.x for the first time.
> 
> -Cory
> 
> On 2/22/21, 1:22 AM, "lustre-discuss" <lustre-discuss-bounces at lists.lustre.org> wrote:
> 
> 
> Dear All,
> 
> After some tests in these days, now I want to report what I have found
> about "moving data error" more detailly.
> 
> As long as the Lustre file system was upgraded from the very old version
> 1.8.8 to 2.12.6, the problem appears, where MDT is ldiskfs based. Although
> probably no body care about the very old version like 1.8.8, but in case
> some people might encounter similar scenario, then probably this message
> could provide some information.
> 
> The problem I have found is: For any directories A/ and B/ created under
> Lustre-1.8.8, then after upgrading to Lustre-2.12.6, running the following
> "mv" command:
> 
>         mv A/file B/
> 
> i.e., moving a file from A/ to B/, there is an error message and file
> moving failed:
> 
>          mv: cannot move 'A/file' to 'B/file': No data available
> 
> 
> I tested the following upgrade procedures:
> 
> 1. Lustre-1.8.8 -> Lustre-2.10.7 -> Lustre-2.12.6 (has problem)
>    - Lustre file system created with Lustre-1.8.8, and directoies A/ and
>      B/ are stored in the Lustre file system (A/ and B/ have some files).
> 
>    - Lustre-1.8.8 -> Lustre-2.10.7:
>      After installing 2.10.7 of Lustre software and corresponding e2fsprogs:
>      $ tunefs.lustre --writeconf /dev/sda1      (the MDT partition)
>      $ tunefs.lustre --writeconf /dev/sda2      (the OST partition)
>      $ tune2fs -O dirdata /dev/sda1
>      $ tune2fs -O dirdata /dev/sda2
> 
>      Then mounting Lustre file system in the client, no problem at all.
> 
>    - Lustre-2.10.7 -> Lustre-2.12.6:
>      After installing 2.12.6 of Lustre software and corresponding e2fsprogs:
>      $ tunefs.lustre --writeconf /dev/sda1      (the MDT partition)
>      $ tunefs.lustre --writeconf /dev/sda2      (the OST partition)
> 
>      Then mounting Lustre file system in the client, the "mv" problem appeared.
> 
> 2. Lustre-1.8.8 -> Lustre-2.12.6 (has problem)
>    - Lustre file system created with Lustre-1.8.8, and directoies A/ and
>      B/ are stored in the Lustre file system (A/ and B/ have some files).
> 
>    - Lustre-1.8.8 -> Lustre-2.12.6:
>      After installing 2.12.6 of Lustre software and corresponding e2fsprogs:
>      $ tunefs.lustre --writeconf /dev/sda1      (the MDT partition)
>      $ tunefs.lustre --writeconf /dev/sda2      (the OST partition)
>      $ tune2fs -O dirdata /dev/sda1
>      $ tune2fs -O dirdata /dev/sda2
> 
>      Then mounting Lustre file system in the client, the "mv" problem appeared.
> 
> 3. Lustre-2.10.7 -> Lustre-2.12.6 (no problem)
>    - Lustre file system created with Lustre-2.10.7, and directoies A/ and
>      B/ are stored in the Lustre file system.
> 
>    - Lustre-2.10.7 -> Lustre-2.12.6:
>      After installing 2.12.6 of Lustre software and corresponding e2fsprogs:
>      $ tunefs.lustre --writeconf /dev/sda1      (the MDT partition)
>      $ tunefs.lustre --writeconf /dev/sda2      (the OST partition)
> 
>      Then mounting Lustre file system in the client, no problem at all.
> 
> 
> So, something is missing when upgrading from 1.8.8, which does not cause
> problem in 2.10.7, but caused "mv" problem in 2.12.6. But so far I cannot
> figure out what has been missed.
> 
> The way to cure this problem is simple. We only need to rename the directories
> created in 1.8.8, i.e.,
> 
>         mv A A.tmp
>         mv A.tmp A
>         mv B B.tmp
>         mv B.tmp B
> 
> Then the "mv" problem between A/ and B/ goes away. But that means we need to
> rename the whole Lustre directories tree in order to cure the problem of old
> directories created in 1.8.8. That would be a huge task. So far I did not
> find better way to resolve this problem.
> 
> Any comments is very welcome.
> 
> Best Regards,
> 
> T.H.Hsieh
> 
> On Fri, Feb 19, 2021 at 03:53:41AM +0800, Tung-Han Hsieh wrote:
> > Dear All,
> >
> > Recently we found a strange problem of the upgraded Lustre file system.
> >
> > We have several very old Lustre file systems with version 1.8.8. We first
> > upgraded them to 2.10.6. It seems ok. Then we upgraded them to 2.12.6.
> > Now we encouter a problem of moving file from directory A to directory B:
> >
> >        mv A/file B/
> >
> > where both A/ and B/ are existing directories since version 1.8.8. The
> > error message says:
> >
> >        mv: cannot move 'A/file' to 'B/file': No data available
> >
> > But we have no problem of opening A/file, copying A/file to B/file, and
> > no problem of running "mv A/file A/file1".
> >
> > More strange is, if we rename both directories, then the problem went away:
> >
> >        mv A A1
> >        mv B B1
> >        mv A1/file B1/          # no problem at all
> >
> > But if we rename only one of them, the problem still remains.
> >
> > It seems that there is something missing during the upgrade. Could anyone
> > know how to fix it ?
> >
> > Thank you very much.
> >
> > T.H.Hsieh
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list