[Lustre-discuss] two problems

Charles Taylor taylor at hpc.ufl.edu
Thu Jun 3 17:31:14 PDT 2010


On Jun 3, 2010, at 6:17 PM, Andreas Dilger wrote:

> On 2010-06-03, at 06:23, Stefano Elmopi wrote:
>> surely my action was to test environment, in a production environment, I would have placed all the files before deleting the server OST1.
> 
> The main problem here is that you have completely erased all knowledge of the failed OST, while there are still files in the filesystem using it (i.e. using lctl --writeconf).
> 
> If the OST had simply failed and been marked inactive (which is what is normally done in such situations) it would still be possible to delete the files.  The problem being seen on the MDT now is simply one that cannot happen in any "normal" failure scenario.

I'm sure I'm speaking out of turn but our recent experience contradicts this.    We lost an OST and marked it as inactive and *could not* remove the files until we actually replaced the OST with another (using the same index).   Once we did that and reactivated the OST we could delete the files which didn't really exist other than on the MDT.  

It was kind of annoying.   Our intent was not to replace the OST but it became such a hassle for us and our users (recursive file operations would often encounter the "missing files" and error out) that we did so just to be able to remove the files that had been on the failed OST.

Regards,

Charlie Taylor
UF HPC Center


More information about the lustre-discuss mailing list