[Lustre-discuss] two problems

Andreas Dilger andreas.dilger at Oracle.com
Thu Jun 3 15:17:15 PDT 2010


On 2010-06-03, at 06:23, Stefano Elmopi wrote:
> surely my action was to test environment, in a production environment, I would have placed all the files before deleting the server OST1.

The main problem here is that you have completely erased all knowledge of the failed OST, while there are still files in the filesystem using it (i.e. using lctl --writeconf).

If the OST had simply failed and been marked inactive (which is what is normally done in such situations) it would still be possible to delete the files.  The problem being seen on the MDT now is simply one that cannot happen in any "normal" failure scenario.

That said, the checks in the MDS could/should probably be made more lenient.  I suspect, however, that there will be a follow-on chain of failures resulting from this, since the file layout is now broken and there are likely missing checks for this "impossible" case elsewhere in the code.

> However, I tried to do:
> 
> unlink zero.dat
> 
> unlink: cannot unlink `zero.dat': Invalid argument
> 
> Jun  3 14:05:29 mdt02prdpom kernel: LustreError: 16265:0:(lov_ea.c:248:lsm_unpackmd_v1()) OST index 1 missing
> Jun  3 14:05:29 mdt02prdpom kernel: Lustre: 16265:0:(lov_pack.c:64:lov_dump_lmm_common()) objid 0x1b20017, magic 0x0bd10bd0, pattern 0x1
> Jun  3 14:05:29 mdt02prdpom kernel: Lustre: 16265:0:(lov_pack.c:67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
> Jun  3 14:05:29 mdt02prdpom kernel: Lustre: 16265:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x62
> 
> For the Kernel Panic console messages, I have them only as an image, I can attach to email ?
> 
> For the second problem:
> 
> doing tests with Quotas, when I go to run the command:
> 
> lfs quotacheck -ug /LUSTRE/
> quotacheck failed: Input/output error
> 
> and the log say:
> 
> kernel: LustreError: 7103:0:(quota_check.c:251:lov_quota_check()) lov idx 1 inactive
> 
> Is there any suggestion ?
> 
> Thanks
> 
> Cheers, Stefano
> 
> 
> 
> 
> Ing. Stefano Elmopi
> Gruppo Darco - Resp. ICT Sistemi
> Via Ostiense 131/L Corpo B, 00154 Roma
> 
> cell. 3466147165
> tel.  0657060500
> email:stefano.elmopi at sociale.it
> 
> "Ai sensi e per effetti della legge sulla tutela  della  riservatezza personale
> (D.lgs n. 196/2003),  questa @mail e' destinata  unicamente alle persone sopra
> indicate e le informazioni in essa contenute sono da considerarsi strettamente
> riservate. E' proibito leggere, copiare, usare o diffondere il contenuto della
> presente @mail  senza  autorizzazione. Se avete ricevuto  questo messaggio per
> errore, siete pregati di rispedire la stessa al mittente. Grazie"
> 
> Il giorno 28/mag/10, alle ore 21:34, Andreas Dilger ha scritto:
> 
>> On 2010-05-27, at 04:15, Stefano Elmopi wrote:
>>> A clarification on what I wrote, the command that go server MGS/MDS in Kernel Panic is:
>>> 
>>>> My version of Lustre is 1.8.3
>>>> By testing, I tried to delete a OST and replace it with another OST
>>>> and now the situation is this:
>>>> 
>>>> cat /proc/fs/lustre/lov/lustre01-mdtlov/target_obd 
>>>> 0: lustre01-OST0000_UUID ACTIVE
>>>> 2: lustre01-OST0002_UUID ACTIVE
>>>> 
>>>> - first problem
>>>> lustre01-OST0001_UUID ACTIVE is the OST was canceled and it had files,
>>>> which of course now there are not more:
>> 
>> Ideally, you should migrate files off the OST before deleting it.
>> 
>>>> ls -lrt
>>>> total 12475312
>>>> ?--------- ? ?    ?             ?            ? zero.dat
>>>> ?--------- ? ?    ?             ?            ? ubuntu-9.10-dvd-i386.iso
>>>> ?--------- ? ?    ?             ?            ? XXXXXXXXX_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? Windows_XP-Capodarco.iso
>>>> ?--------- ? ?    ?             ?            ? UBUNTU_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? KK_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? FFFFF_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? CentOS-5.3-i386-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? BBBBB_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? BAK_CentOS-5.4-x86_64-bin-DVD.iso
>>>> ?--------- ? ?    ?             ?            ? 2.iso
>>>> 
>>>> 
>>>> I to delete them, follow these steps:
>> 
>> You should be able to delete them from the client with "unlink zero.dat", which will return an ENOENT error, but the file should be gone.  No need to run lfsck at all.
>> 
>>>> and the server MGS/MDS go to in Kernel Panic
>> 
>> What do the MDS console messages say?  That is the root of the problem.
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
> 


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list