[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Tue Dec 21 07:58:25 PST 2010

A little more info here....

We have 30 OSTs hosted on 5 OSSs where we are using Areca 1680ix PCI-E  
RAID controllers.    The long and short of it is that the Areca  
1680ix's have proven completely buggy and unreliable - not what you  
want in a RAID card.    So we are evacuating all the OSTs, replacing  
the Areca 1680ix cards with Adaptec 51645s, re-initializing the LUNs,  
reformatting the LUNs as OSTs (using the same OST index as before) and  
remounting them.    That is the plan anyway.

We've already reformatted (mkfs.lustre) one set of 6 OSTs and did not  
save the "magic" files and so are getting the "Address in Use" error  
for those OSTs.   That being the case, I assume we must

1. Unmount the file system from all clients
2. Unmount the OSTs
3 Unmount the MDT
4. tunefs.lustre --writeconf /dev/mdt
5. remount the MDT
6. remount the OSTs (including the reformatted ones)
7. remount the file system on clients.

1. Is this the correct sequence?
2. Will this leave all our data intact?
3. Must we do a writeconf on the OSTs too or just the MDT?

Also, for the remaining OSTs we will save the "magic" files and  
restore them after reformatting which should eliminate the need for  
the procedure above.    With some of the OSTs mounted as ldiskfs, I  
see the last_rcvd file and the CONFIG directory but no LAST_ID  
file.     Should the LAST_ID file be there?

Regards,

Charlie Taylor
UF HPC Center

On Dec 20, 2010, at 11:18 PM, Wang Yibin wrote:

> Hello,
>
> Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from  
> the original OSTs and put them back before trying to mount them?
> You probably didn't do that. So when you remount the OSTs with  
> existing index, the MGS will refuse to add them without being told  
> to writeconf, hence -EADDRINUSE.
> The proper ways to replace an OST are described in bug 24128.
>
> 在 2010-12-21，上午8:33， Craig Prescott 写道：
>
>>
>> Hello list,
>>
>> We recently evacuated several OSTs on a single OSS, replaced RAID
>> controllers, re-initialized RAIDs for new OSTs, and made new lustre
>> filesystems for them, using the same OST indices as we had before.
>>
>> The filesystem and all its clients have been up and running the whole
>> time.  We disabled the OSTs we were working on on all clients and our
>> MGS/MDS (lctl dl shows them as "IN" everywhere).
>>
>> Now we want to bring the newly-formatted OSTs back online.  When we  
>> try
>> to mount the "new" OSTs, we get this for each one in this syslog of  
>> the
>> OSS that has been under maintenance:
>>
>>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>>> LustreError: 11-0: an error occurred while communicating with  
>>> 10.13.28.210 at o2ib. The mgs_target_reg operation failed with -98
>>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets())  
>>> Required registration failed for cms-OST0006: -98
>>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable  
>>> to start targets: -98
>>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd  
>>> cms-OST0006
>>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())  
>>> cms-OST0006 not registered
>>
>> What do we need to do to get these OSTs back into the filesystem?
>>
>> We really want to reuse the original indices.
>>
>> This is Lustre 1.8.4, btw.
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss