[Lustre-discuss] how do I deactivate a very wonky OST

Andrus, Brian Contractor bdandrus at nps.edu
Fri Jan 23 10:37:07 PST 2015


Thomas,

Thanks for the info.
Current state, the OST is unregistered. When I tried to start an lfsck, it kernel panicked all of the OSSes and when I would try to bring up OST5, that would kernel panic the OSS it was on. So I brought everything back up except it to keep it from registering.

Doing the lfs find does not work directly, it throws an error for every file that has anything on the bad OST, so I was able to capture the STDERR to get the info for those files.

I haven't given up hope on the OST since it passes e2fsk with no issues. I can mount it as ldiskfs and see everything too. But, as long as it is unavailable, I cannot even delete or unlink files that are on it.

Right now, we have the filesystem up, it is working except any file that has data on the bad OST is inaccessible and cannot be removed. It would be nice to figure out what is wrong with the OST that makes the OSS panic if it gets mounted as part of the filesystem.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238





-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Thomas Roth
Sent: Thursday, January 22, 2015 11:28 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] how do I deactivate a very wonky OST

Brian,

the command to deactivate we are using is
> lctl set_param osc.FS_Name-ID-osc*.active=0
as in lctl set_param osc.lustre-OST0122-osc*.active=0

(lctl --device has failed me a number of times, probably got the wrong device-number there.)

If your killer OST was once part of you Lustre, to "unregister" it for good would ask for extreme measures (total shutdown, total writeconf everywhere, restart) - did you do that? If not, you should be able to start all your servers, omitting the bad OST, the 'lctl set_param' it on the MDT and all clients. Then Lustre should be usable, except for the file remnants of the objects on the missing OST.

At this point I'd run
> lfs find  --obd FS-Name-#{ost}_UUID /mountpoint
writing the output to some files, which are later used to delete/unlink all these remnants.


Cheers,
Thomas

On 01/15/2015 12:11 AM, Andrus, Brian Contractor wrote:
> Thanks Sean,
> 
> Right now neither help me as I had to bring the entire system up from scratch and NOT mount the bad OST.
> So, now OST5 is not listed anywhere. The only knows it is missing.
> Doing 'lctl dl' only show the OSTs that have been brought up.
> If I try to bring it up, it registers, the MDS becomes aware, the OSS kernel panics and the MDS starts making everyone wait for it to come back.
> 
> Part of 'lfs df':
> OST0005             : Resource temporarily unavailable
> 
> Hassle is I cannot really do an 'lfs find' for the files on the bad OST because the OST is not registered... stuck in a loop here...
> 
> If I could find a way to tag it as offline even though the MDS doesn't see it yet, that may help.
> 
> 
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
> 
> 
> 
> 
> From: Sean Brisbane [mailto:s.brisbane1 at physics.ox.ac.uk]
> Sent: Wednesday, January 14, 2015 3:04 PM
> To: Andrus, Brian Contractor; lustre-discuss at lists.lustre.org
> Subject: RE: how do I deactivate a very wonky OST
> 
> This caught me out in a recent upgrade:
> 
> cat /proc/fs/lustre/lov/{yourmdt}/target_obd
> 
> rather than
> 
> "lctl dl"
> 
> Shows the state of the OST.
> 
> Cheers,
> Sean
> ________________________________
> From: 
> lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces@
> lists.lustre.org> [lustre-discuss-bounces at lists.lustre.org] on behalf 
> of Andrus, Brian Contractor [bdandrus at nps.edu]
> Sent: 13 January 2015 17:28
> To: 
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org
> >
> Subject: [Lustre-discuss] how do I deactivate a very wonky OST All,
> 
> We are still trying to move forward getting our filesystem at least partially up with a failed OST.
> 
> Currently the OST will kernel panic any device that mounts it. That seems to be a constant.
> 
> So, the plan is to bring the system up without that OST and find what data will be lost.
> Now, I am trying to deactivate the OST on the MGS, but it seems to have no effect.
> Running lctl --device 14 deactivate does not change anything. The OST still shows 'UP'
> 
> Is there a way to force lustre to deactivate an OST altogether when it is showing 'UP' and the OST is not going to be happily mounted?
> 
> I can mount the filesystem, but many actions hang (ls, df, etc).
> 
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
> 
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 


--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1
64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Professor Dr. Dr. h.c. Horst Stöcker, Dr.-Ing. Jürgen Henschel

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list