[Lustre-discuss] Cannot get an OST to activate
Bernd Schubert
bs_lists at aakef.fastmail.fm
Fri Sep 3 14:22:24 PDT 2010
On Friday, September 03, 2010, Bob Ball wrote:
> We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of
> 8.9TB each. Within a day of having these on-line, one OST stopped
> accepting new files. I cannot get it to activate. The other 5 seem fine.
>
> On the MDS "lctl dl" shows it IN, but not UP, and files can be read from
> it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5
>
> However, I cannot get it to re-activate:
> lctl --device umt3-OST001d-osc activate
>
[...]
> LustreError: 4697:0:(filter.c:3172:filter_handle_precreate())
> umt3-OST001d: ignoring bogus orphan destroy request: obdid
> 11309489156331498430 last_id 0
>
> Can anyone tell me what must be done to recover this disk volume?
Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on an
OST).
It is on my TODO list to write tool to automatically correct the "lov_objid",
but as of now I don't have it yet. Somehow your lov_objid file has a
completely wrong value for this OST.
Now, when you say "files can be read from it", are you sure there are already
files on that OST? Because the error message says that the last_id is zero and
so you should not have a single file on it. If that is also wrong, you will
need to correct it as well. You can do that manually, or you can use a patched
e2fsprogs version, that will do that for you
Patches are here:
https://bugzilla.lustre.org/show_bug.cgi?id=22734
Packages can be found on my home page:
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/
If you want to do it automatically, you will need to create a lfsck mdsdb file
(the hdr file is sufficient, see the lfsck section in the manual) and then you
will need to run e2fsck for that OST as if you want to create an OSTDB file.
That will start pass6, and if you then run e2fsck *without* "-n", so in
correcting mode, it will correct the LAST_ID file to what it finds on disk.
With "-v" it will also tell you the old and the new value and then you will
need to put that value properly coded into the MDS lov_objid file.
Be careful and create backups of the lov_objid and LAST_ID files.
Hope it helps,
Bern
--
Bernd Schubert
DataDirect Networks
More information about the lustre-discuss
mailing list