[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Craig Prescott prescott at hpc.ufl.edu
Tue Dec 21 06:11:36 PST 2010


Thanks for this.  You are right - we didn't back up and replace those
files.  We did this once before, and I don't recall doing a writeconf or
anything with magic files, but I guess we must have.

We no longer have magic files (last_rcvd, LAST_ID, CONFIG/*) from the
old OSTs.  According to bug 24128, this puts us in the "cold replace"
scenario.  Is there anything we can do to avoid quiescing the entire
filesystem?  If we can avoid unmounting all the clients, we'd prefer it.

Since our combo MGT/MDT is going to have to be unmounted for the
writeconf, we'd have the opportunity to mount it as ldiskfs and muck around.

Thanks again,
Craig Prescott
UF HPC Center

Wang Yibin wrote:
> Hello,
> 
> Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from the original OSTs and put them back before trying to mount them?
> You probably didn't do that. So when you remount the OSTs with existing index, the MGS will refuse to add them without being told to writeconf, hence -EADDRINUSE.
> The proper ways to replace an OST are described in bug 24128.
> 
> On 2010-12-21 at 8:33pm, Craig Prescott wrote:
> 
>> Hello list,
>>
>> We recently evacuated several OSTs on a single OSS, replaced RAID 
>> controllers, re-initialized RAIDs for new OSTs, and made new lustre 
>> filesystems for them, using the same OST indices as we had before.
>>
>> The filesystem and all its clients have been up and running the whole 
>> time.  We disabled the OSTs we were working on on all clients and our 
>> MGS/MDS (lctl dl shows them as "IN" everywhere).
>>
>> Now we want to bring the newly-formatted OSTs back online.  When we try 
>> to mount the "new" OSTs, we get this for each one in this syslog of the 
>> OSS that has been under maintenance:
>>
>>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>>> LustreError: 11-0: an error occurred while communicating with 10.13.28.210 at o2ib. The mgs_target_reg operation failed with -98
>>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for cms-OST0006: -98
>>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -98
>>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd cms-OST0006
>>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount()) cms-OST0006 not registered
>> What do we need to do to get these OSTs back into the filesystem?
>>
>> We really want to reuse the original indices.
>>
>> This is Lustre 1.8.4, btw.
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 




More information about the lustre-discuss mailing list