[Lustre-discuss] Inactive OST

Mon Apr 19 23:23:31 PDT 2010

Greetings again!

Thank you very much, I have formatted one OST using FS lustre00 
(mkfs.lustre --ost --reformat --fsname=lustre00 
--mgsnode=192.168.11.12 at o2ib 801); restarted Lustre file system and got 
all OST's UP and active.
But when I tried to test system, I've meet another error:
On client, I created directory; set striping (I choosed small chunks to 
got file on both OST's) and added files to the directory.
But when I made lfs getstripe command I got next stack:

[Katya at Client]$ sudo lfs getstripe My_file.tif
OBDS:_formatting.txt INF0.txt My.log~ ha_errors~ links~ writeconf
0: lustre00-OST0000_UUID ACTIVE~ My_New_test.txt input lustre-log writeconf~
1: lustre00-OST0001_UUID ACTIVE ha_errors links tmp
*** buffer overflow detected ***: lfs terminated
======= Backtrace: =========tre00/
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f577afeca27]
/lib64/libc.so.6(+0xdea40)[0x7f577afeaa40]
/lib64/libc.so.6(+0xddd04)[0x7f577afe9d04]
lfs[0x41872c]~]# vim After_formatting.txt
lfs[0x418dcc]
lfs[0x4192fd]
lfs[0x403e70]
lfs[0x409678]
lfs[0x404947]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f577af2ab6d]
lfs[0x402f89]
======= Memory map: ========
00400000-0045c000 r-xp 00000000 08:01 7401524 /usr/bin/lfs
0065b000-0065c000 r--p 0005b000 08:01 7401524 /usr/bin/lfs
0065c000-0065d000 rw-p 0005c000 08:01 7401524 /usr/bin/lfs
0065d000-00691000 rw-p 0065d000 00:00 0 [heap]
7f577aad4000-7f577aaea000 r-xp 00000000 08:01 1941620 /lib64/libgcc_s.so.1
7f577aaea000-7f577ace9000 ---p 00016000 08:01 1941620 /lib64/libgcc_s.so.1
7f577ace9000-7f577acea000 r--p 00015000 08:01 1941620 /lib64/libgcc_s.so.1
7f577acea000-7f577aceb000 rw-p 00016000 08:01 1941620 /lib64/libgcc_s.so.1
7f577aceb000-7f577ad08000 r-xp 00000000 08:01 1941553 /lib64/libtinfo.so.5.7
7f577ad08000-7f577af07000 ---p 0001d000 08:01 1941553 /lib64/libtinfo.so.5.7
7f577af07000-7f577af0b000 r--p 0001c000 08:01 1941553 /lib64/libtinfo.so.5.7
7f577af0b000-7f577af0c000 rw-p 00020000 08:01 1941553 /lib64/libtinfo.so.5.7
7f577af0c000-7f577b056000 r-xp 00000000 08:01 1941507 /lib64/libc-2.11.1.so
7f577b056000-7f577b256000 ---p 0014a000 08:01 1941507 /lib64/libc-2.11.1.so
7f577b256000-7f577b25a000 r--p 0014a000 08:01 1941507 /lib64/libc-2.11.1.so
7f577b25a000-7f577b25b000 rw-p 0014e000 08:01 1941507 /lib64/libc-2.11.1.so
7f577b25b000-7f577b260000 rw-p 7f577b25b000 00:00 0
7f577b260000-7f577b298000 r-xp 00000000 08:01 1941558 
/lib64/libreadline.so.5.2
7f577b298000-7f577b497000 ---p 00038000 08:01 1941558 
/lib64/libreadline.so.5.2
7f577b497000-7f577b499000 r--p 00037000 08:01 1941558 
/lib64/libreadline.so.5.2
7f577b499000-7f577b49f000 rw-p 00039000 08:01 1941558 
/lib64/libreadline.so.5.2
7f577b49f000-7f577b4a1000 rw-p 7f577b49f000 00:00 0
7f577b4a1000-7f577b4bd000 r-xp 00000000 08:01 1941513 /lib64/ld-2.11.1.so
7f577b6af000-7f577b6b2000 rw-p 7f577b6af000 00:00 0
7f577b6b5000-7f577b6b6000 rw-p 7f577b6b5000 00:00 0
7f577b6b6000-7f577b6bb000 rw-s 00000000 00:08 1835008 /SYSV00000000 
(deleted)
7f577b6bb000-7f577b6bc000 rw-p 7f577b6bb000 00:00 0
7f577b6bc000-7f577b6bd000 r--p 0001b000 08:01 1941513 /lib64/ld-2.11.1.so
7f577b6bd000-7f577b6be000 rw-p 0001c000 08:01 1941513 /lib64/ld-2.11.1.so
7f577b6be000-7f577b6bf000 rw-p 7f577b6be000 00:00 0
7fff06a07000-7fff06a1c000 rw-p 7ffffffea000 00:00 0 [stack]
7fff06a41000-7fff06a42000 r-xp 7fff06a41000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted

Is there some way to repair it?

I've tried writeconfig but nothing changes.
_________________
Thanks,
Katya

Andreas Dilger wrote:
> On 2010-04-19, at 01:41, xgl at xgl.pereslavl.ru wrote:
>> I have 1 OST that seems like inactive device on client:
>> [Client] lfs df -h
>> UUID bytes Used Available Use% Mounted on
>> lustre00-MDT0000_UUID 814.8G 471.8M 767.8G 0% /mnt/lustre00[MDT:0]
>> lustre00-OST0000_UUID: inactive device
>> lustre00-OST0001_UUID 7.2T 10.4G 6.8T 0% /mnt/lustre00[OST:1]
>> How can I activate this device?
>>
>> I have 2 OSSs theoretically configured as a failover pair using 
>> heartbeat, 1 MDS and 2 OSTs accessible from both OSS-es.
>> haresources:
>> my1.localdomain Filesystem::/dev/disk/by-id/scsi-801::/mnt/ost0::lustre
>> my2.localdomain 
>> Filesystem::/dev/disk/by-id/scsi-800::/mnt/mdt::lustre 
>> Filesystem::/dev/disk/by-id/scsi-802::/mnt/ost1::lustre
>>
>> On both OSS-es this device seems like active:
>> [my2.localdomain ~]# lctl dl
>> 5 UP osc lustre00-OST0001-osc lustre00-mdtlov_UUID 5
>> 6 UP osc lustre00-OST0000-osc lustre00-mdtlov_UUID 5
>> 8 UP obdfilter lustre00-OST0001 lustre00-OST0001_UUID 7
>>
>> 0 UP mgc MGC192.168.11.152 at o2ib 89a7ffad-6d5e-8468-1b95-c694f35b8ad1 5
>> 1 UP ost OSS OSS_uuid 3
>> 2 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3
>>
>> What am I missing?
>
>
> If, in fact, the OST is active on both OSSes, that would be VERY bad. 
> However, it seems like you have two different OSTs, one in the 
> "lustre" filesystem, one in the "lustre00" filesystem, so it seems you 
> have some sort of a configuration problem.
>
> Cheers, Andreas
> -- 
> Andreas Dilger
> Principal Engineer, Lustre Group
> Oracle Corporation Canada Inc.
>
>
>