[Lustre-discuss] Cannot get an OST to activate
bs_lists at aakef.fastmail.fm
Fri Sep 3 14:35:41 PDT 2010
On Friday, September 03, 2010, Bernd Schubert wrote:
> On Friday, September 03, 2010, Bob Ball wrote:
> > We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of
> > 8.9TB each. Within a day of having these on-line, one OST stopped
> > accepting new files. I cannot get it to activate. The other 5 seem
> > fine.
> > On the MDS "lctl dl" shows it IN, but not UP, and files can be read from
> > it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5
> > However, I cannot get it to re-activate:
> > lctl --device umt3-OST001d-osc activate
> > LustreError: 4697:0:(filter.c:3172:filter_handle_precreate())
> > umt3-OST001d: ignoring bogus orphan destroy request: obdid
> > 11309489156331498430 last_id 0
> > Can anyone tell me what must be done to recover this disk volume?
> Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on
> an OST).
> It is on my TODO list to write tool to automatically correct the
> "lov_objid", but as of now I don't have it yet. Somehow your lov_objid
> file has a completely wrong value for this OST.
> Now, when you say "files can be read from it", are you sure there are
> already files on that OST? Because the error message says that the last_id
> is zero and so you should not have a single file on it. If that is also
> wrong, you will need to correct it as well. You can do that manually, or
> you can use a patched e2fsprogs version, that will do that for you
> Patches are here:
> Packages can be found on my home page:
> If you want to do it automatically, you will need to create a lfsck mdsdb
> file (the hdr file is sufficient, see the lfsck section in the manual) and
> then you will need to run e2fsck for that OST as if you want to create an
> OSTDB file. That will start pass6, and if you then run e2fsck *without*
> "-n", so in correcting mode, it will correct the LAST_ID file to what it
> finds on disk. With "-v" it will also tell you the old and the new value
> and then you will need to put that value properly coded into the MDS
> lov_objid file.
Update for the lov_objd file, actually, if you rename or delete it (rename it
please, so that you have a backup), the MDS should be able to re-create it
from OST LAST_ID data.
So if the troublesome OST has no data yet, it will be very easy, if it already
has data, you will need to correct the LAST_ID on that OST first.
More information about the lustre-discuss