[lustre-discuss] File writes blocking on Lustre 2.7.0

Fri Jun 5 11:25:27 PDT 2015

OK, this was just odd to me.  We have a Lustre 2.7.0 system running now, 
and, after setting up our first OSS, copying over all files from the old 
system, we brought the new file system online.  The new backingstore is 
zfs.  All was well with the world.

Meanwhile, all the old file servers were rebuilt, with zfs backing 
store, and I am now using lfs_migrate to balance out the files.

So, two questions for which I would like opinions.

1. A single disk failed on one of the old, rebuilt file servers, and ALL 
lfs_migrate threads blocked upon the failure, all at the same time.  
This behavior was unexpected.  Should I have been surprised? This was 
not the case for Lustre 2.1.6 (well, that was upgraded from 1.8.x).  Is 
there a configuration that I can change that would change this 
behavior?  lfs_migrate threads resumed once the fail condition on that 
one OSS/OST was cleared.

2. As I was starting to migrate 22M files off the original 2.7.0 server, 
I deactivated those OST on the mgs/mdt combined machine.  I saw at this 
point that the occupied space was, apparently, not dropping on any OST 
of that original OSS, while it was now growing in the other OSS/OST.  I 
found LU4825 about "lfs migrate not freeing space on OST".  
Re-activating these OST re-established the used space correctly.  Is 
there another way to prevent new files from going to these "migrating 
off from" OST than by deactivating them?    It just seems to me that 
such a huge llog replay, assuming I leave the OST deactivated during the 
whole migration, is just not a good idea; or is that just my fear and 
ignorance speaking?

Thanks for any advice on these 2 questions.
bob