[Lustre-discuss] Lustre mount problem

Lundgren, Andrew Andrew.Lundgren at Level3.com
Wed Apr 30 07:58:42 PDT 2008


>From what I have been able to gather, this is not possible at the moment.

The dead OST will always be there.  There is no functioning way to actually remove it at the moment.

https://bugzilla.lustre.org/show_bug.cgi?id=15345

We are running in failout mode rather than failover.  When our new clients tried to reconnect our test cluster, they appeared to block forever.  I would need to recreate the situation to validate that is the behavior, but I am dealing with another issue at the moment.

If you are on a production cluster, you may be in a bad way.  The only way I have found to recover this is to wipe the cluster and start fresh.  (Not a good option.)

--
Andrew

> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of
> Frank Mietke
> Sent: Wednesday, April 30, 2008 3:03 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] Lustre mount problem
>
> Hi,
>
> we're using 1.6.4.2 here. 5 days ago we lost an OST block
> device forever. I
> deactivated this OST, recreated the block device and started
> "mkfs.lustre --reformat ..." on this
> device and gave it the same OST id (OST0010)  as before. A
> mount of this OST went wrong
> with the "already in use" message. Therefore I reformatted
> the device again with
> no index number passed. But now, when I try to mount the
> clients I've got the
> following in dmesg:
>
> Lustre: 1837:0:(obd_mount.c:1685:lustre_check_exclusion()) Excluding
> chicfs-OST0010-osc (on exclusion list)
> Lustre: setting import chicfs-OST0010_UUID INACTIVE by
> administrator request
> Lustre: chicfs-OST0010-osc-00000100cfa0f800.osc: set
> parameter active=0
> LustreError: 1837:0:(lov_obd.c:140:lov_connect_obd()) not
> connecting OSC
> chicfs-OST0010_UUID; administratively disabled
> Lustre: Client chicfs-client has started
> ib0: no IPv6 routers present
> LustreError: 2596:0:(client.c:504:ptlrpc_import_delay_req())
> @@@ Uninitialized
> import.  req at 00000100cfb91600 x274/t0
> o400->chicfs-OST0010_UUID@<NULL>:6 lens
> 64/64 ref 1 fl Rpc:N/0/0 rc 0/0
> LustreError: 2596:0:(client.c:506:ptlrpc_import_delay_req()) LBUG
> Lustre: 2596:0:(linux-debug.c:168:libcfs_debug_dumpstack())
> showing stack for
> process 2596
> lfs           R  running task       0  2596   2594
>          (NOTLB)
> 00000100715f9ca8 0000000000000000 0000000000000000 00000100718d6740
>        00000100715f9e78 00000100715f9ca8 00000100715f9cb8
> 000001007e32c800
>        ffffffffa031e990 00000000000001fa
> Call Trace:<ffffffff80148b4b>{__kernel_text_address+26}
> <ffffffff801115c0>{show_trace+375}
>        <ffffffff801116fc>{show_stack+241}
> <ffffffffa01e39c3>{:libcfs:lbug_with_loc+115}
>        <ffffffffa02ebc4e>{:ptlrpc:ptlrpc_import_delay_req+238}
>        <ffffffffa02f05e8>{:ptlrpc:ptlrpc_queue_wait+584}
> <ffffffffa02f9bb3>{:ptlrpc:lustre_pack_request+995}
>        <ffffffffa02eb9f8>{:ptlrpc:ptlrpc_prep_req_pool+1832}
>        <ffffffff80178f93>{file_move+27}
> <ffffffff80177540>{dentry_open_it+284}
>        <ffffffffa031ceac>{:ptlrpc:lprocfs_wr_ping+444}
> <ffffffff803211ef>{__down_read+52}
>        <ffffffffa0266a15>{:obdclass:lprocfs_fops_write+117}
>        <ffffffff8017821a>{vfs_write+207}
> <ffffffff80178302>{sys_write+69}
>        <ffffffff8011022a>{system_call+126}
>
> Any hints are appreciated. Is there a way to fully remove the
> OST0010 from the
> configuration?
>
> Thanks,
> Frank
>
>
>
> --
> Dipl.-Inf. Frank Mietke     |     Fakultätsrechen- und
> Informationszentrum
> Tel.: 0371 - 531 - 35538    |     Fak. für Informatik
> Fax:  0371 - 531 8 35538    |     TU-Chemnitz
> Key-ID: 60F59599            |
> frank.mietke at informatik.tu-chemnitz.de
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list