[Lustre-discuss] Lustre mount problem

Frank Mietke frank.mietke at informatik.tu-chemnitz.de
Wed Apr 30 08:47:55 PDT 2008


Hi Andrew,

On Wed, Apr 30, 2008 at 08:58:42AM -0600, Lundgren, Andrew wrote:
> From what I have been able to gather, this is not possible at the moment.
> 
> The dead OST will always be there.  There is no functioning way to actually remove it at the moment.
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=15345

thank you for pointing me to this bug report.

> 
> We are running in failout mode rather than failover.  When our new clients tried to reconnect our test cluster, they appeared to block forever.  I would need to recreate the situation to validate that is the behavior, but I am dealing with another issue at the moment.
> 
> If you are on a production cluster, you may be in a bad way.  The only way I have found to recover this is to wipe the cluster and start fresh.  (Not a good option.)

I could live with a "dead" OST in the configuration but as I've written in the
update, every call to a proc-entry of this OST on the clients hangs forever. Not
really optimal.

Thanks,
Frank

> 
> --
> Andrew
> 
> > -----Original Message-----
> > From: lustre-discuss-bounces at lists.lustre.org
> > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of
> > Frank Mietke
> > Sent: Wednesday, April 30, 2008 3:03 AM
> > To: lustre-discuss at lists.lustre.org
> > Subject: [Lustre-discuss] Lustre mount problem
> >
> > Hi,
> >
> > we're using 1.6.4.2 here. 5 days ago we lost an OST block
> > device forever. I
> > deactivated this OST, recreated the block device and started
> > "mkfs.lustre --reformat ..." on this
> > device and gave it the same OST id (OST0010)  as before. A
> > mount of this OST went wrong
> > with the "already in use" message. Therefore I reformatted
> > the device again with
> > no index number passed. But now, when I try to mount the
> > clients I've got the
> > following in dmesg:
> >
> > Lustre: 1837:0:(obd_mount.c:1685:lustre_check_exclusion()) Excluding
> > chicfs-OST0010-osc (on exclusion list)
> > Lustre: setting import chicfs-OST0010_UUID INACTIVE by
> > administrator request
> > Lustre: chicfs-OST0010-osc-00000100cfa0f800.osc: set
> > parameter active=0
> > LustreError: 1837:0:(lov_obd.c:140:lov_connect_obd()) not
> > connecting OSC
> > chicfs-OST0010_UUID; administratively disabled
> > Lustre: Client chicfs-client has started
> > ib0: no IPv6 routers present
> > LustreError: 2596:0:(client.c:504:ptlrpc_import_delay_req())
> > @@@ Uninitialized
> > import.  req at 00000100cfb91600 x274/t0
> > o400->chicfs-OST0010_UUID@<NULL>:6 lens
> > 64/64 ref 1 fl Rpc:N/0/0 rc 0/0
> > LustreError: 2596:0:(client.c:506:ptlrpc_import_delay_req()) LBUG
> > Lustre: 2596:0:(linux-debug.c:168:libcfs_debug_dumpstack())
> > showing stack for
> > process 2596
> > lfs           R  running task       0  2596   2594
> >          (NOTLB)
> > 00000100715f9ca8 0000000000000000 0000000000000000 00000100718d6740
> >        00000100715f9e78 00000100715f9ca8 00000100715f9cb8
> > 000001007e32c800
> >        ffffffffa031e990 00000000000001fa
> > Call Trace:<ffffffff80148b4b>{__kernel_text_address+26}
> > <ffffffff801115c0>{show_trace+375}
> >        <ffffffff801116fc>{show_stack+241}
> > <ffffffffa01e39c3>{:libcfs:lbug_with_loc+115}
> >        <ffffffffa02ebc4e>{:ptlrpc:ptlrpc_import_delay_req+238}
> >        <ffffffffa02f05e8>{:ptlrpc:ptlrpc_queue_wait+584}
> > <ffffffffa02f9bb3>{:ptlrpc:lustre_pack_request+995}
> >        <ffffffffa02eb9f8>{:ptlrpc:ptlrpc_prep_req_pool+1832}
> >        <ffffffff80178f93>{file_move+27}
> > <ffffffff80177540>{dentry_open_it+284}
> >        <ffffffffa031ceac>{:ptlrpc:lprocfs_wr_ping+444}
> > <ffffffff803211ef>{__down_read+52}
> >        <ffffffffa0266a15>{:obdclass:lprocfs_fops_write+117}
> >        <ffffffff8017821a>{vfs_write+207}
> > <ffffffff80178302>{sys_write+69}
> >        <ffffffff8011022a>{system_call+126}
> >
> > Any hints are appreciated. Is there a way to fully remove the
> > OST0010 from the
> > configuration?
> >
> > Thanks,
> > Frank
> >
> >
> >
> > --
> > Dipl.-Inf. Frank Mietke     |     Fakultätsrechen- und
> > Informationszentrum
> > Tel.: 0371 - 531 - 35538    |     Fak. für Informatik
> > Fax:  0371 - 531 8 35538    |     TU-Chemnitz
> > Key-ID: 60F59599            |
> > frank.mietke at informatik.tu-chemnitz.de
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> 

-- 
Dipl.-Inf. Frank Mietke     |     Fakultätsrechen- und Informationszentrum
Tel.: 0371 - 531 - 35538    |     Fak. für Informatik
Fax:  0371 - 531 8 35538    |     TU-Chemnitz
Key-ID: 60F59599            |     frank.mietke at informatik.tu-chemnitz.de



More information about the lustre-discuss mailing list