[Lustre-discuss] lfs_migrate question

Jason Brooks brookjas at ohsu.edu
Thu Oct 18 15:11:28 PDT 2012


I suffered an oss crash where my oss server had a cpu fault.  I have it running again, but I am trying to decommission it.  I am migrating the data off of it onto other ost's using the lfs find command with lfs_migrate.

It's been nearly 36 hours and about 2 terabytes have been moved.  This means I am about halfway.  Is this a decent rate?

Here are the particulars, which basically are snags.  I know they affect things, I just am not certain to what degree:

 1.  I am running lfs_migrate on two systems, migrating different subdirectories of the same mount point.
 2.  All systems are running using ip over infiniband.
 3.  None of my client-only systems have lfs or lfs_migrate.  I think this is because they are ubuntu and only the lustre kernel modules are installed.  Thus I can't run it there.
 4.   Oh, and that also means that the lustre filesytem is mounted on the oss's too.
 5.  lfs_migrate and lfs did not seem to operate correctly on the oss's that are 1.8.6.  Works ok on 1.8.8 though.
 6.  AND the two systems I am running lfs_migrate on are probably the very systems with free ost space on them.  In other words, file blocks are being written to the very systems that lfs_migrate is being run on and/or there is a lot of block write traffic between the two.

Lustre versions:
Mds/mgs: 1.8.6
5 of 7 OSS's: 1.8.6
2 of 7 oss's: 1.8.8

Clients: 1.8.6, ubuntu.

