[Lustre-discuss] Cannot get an OST to activate

Bernd Schubert bs_lists at aakef.fastmail.fm
Fri Sep 3 14:35:41 PDT 2010


On Friday, September 03, 2010, Bernd Schubert wrote:
> On Friday, September 03, 2010, Bob Ball wrote:
> > We added a new OSS to our 1.8.4 Lustre installation.  It has 6 OST of
> > 8.9TB each.  Within a day of having these on-line, one OST stopped
> > accepting new files.  I cannot get it to activate.  The other 5 seem
> > fine.
> > 
> > On the MDS "lctl dl" shows it IN, but not UP, and files can be read from
> > it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5
> > 
> > However, I cannot get it to re-activate:
> > lctl --device umt3-OST001d-osc activate
> 
> [...]
> 
> > LustreError: 4697:0:(filter.c:3172:filter_handle_precreate())
> > umt3-OST001d: ignoring bogus orphan destroy request: obdid
> > 11309489156331498430 last_id 0
> > 
> > Can anyone tell me what must be done to recover this disk volume?
> 
> Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on
> an OST).
> 
> It is on my TODO list to write tool to automatically correct the
> "lov_objid", but as of now I don't have it yet. Somehow your lov_objid
> file has a completely wrong value for this OST.
> Now, when you say "files can be read from it", are you sure there are
> already files on that OST? Because the error message says that the last_id
> is zero and so you should not have a single file on it. If that is also
> wrong, you will need to correct it as well. You can do that manually, or
> you can use a patched e2fsprogs version, that will do that for you
> 
> Patches are here:
> https://bugzilla.lustre.org/show_bug.cgi?id=22734
> 
> Packages can be found on my home page:
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/
> 
> 
> If you want to do it automatically, you will need to create a lfsck mdsdb
> file (the hdr file is sufficient, see the lfsck section in the manual) and
> then you will need to run e2fsck for that OST as if you want to create an
> OSTDB file. That will start pass6, and if you then run e2fsck *without*
> "-n", so in correcting mode, it will correct the LAST_ID file to what it
> finds on disk. With "-v" it will also tell you the old and the new value
> and then you will need to put that value properly coded into the MDS
> lov_objid file.

Update for the lov_objd file, actually, if you rename or delete it (rename it 
please, so that you have a backup), the MDS should be able to re-create it 
from OST LAST_ID data. 
So if the troublesome OST has no data yet, it will be very easy, if it already 
has data, you will need to correct the LAST_ID on that OST first.

Cheers,
Bernd


-- 
Bernd Schubert
DataDirect Networks



More information about the lustre-discuss mailing list