[Lustre-discuss] Disappearing OSTs

jrs botemout at gmail.com
Fri May 2 09:09:49 PDT 2008


I just made an ext3 filesystem, mounted it (on both OSSes - not
at the same time), unmounted, reboot both servers and it's still
there.  It appearance that this destruction of filesystems is
a lustre only thing.

A difference between this and the lustre filesystem, of course, is that
there is not device name created for the partition, e.g.,

oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0102_01_bad_no_use*
brw------- 1 root root 253,  7 May  2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use
brw------- 1 root root 253, 12 May  2 09:09 /dev/mapper/ost_oss01_lustre0102_01_bad_no_use-part1

while a lustre OST uses the whole disk/volume

oss01:~ # ls -l /dev/mapper/ost_oss01_lustre0304_02
brw------- 1 root root 253, 5 May  2 09:45 /dev/mapper/ost_oss01_lustre0304_02

In the mailing lists some time back someone had talked about kpartx (though
I think it was in the context of having a consistent device name - whch
I have no trouble with since I'm explicitly naming them in /etc/multipathd.conf.

Another issue that appears to be a bug, though is probably not related to
my issue is when running mkfs.lustre with --failnode mmp should be set on
the filesystem.  However, looking at the output of dumpe2fs that doesn't
appear to be the case:

oss01:/net/lmd01/space/lustre # dumpe2fs -h /dev/mapper/ost_oss01_lustre0304_02|grep -A 1 feat
dumpe2fs 1.40.4.cfs1 (31-Dec-2007)
Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash

Of course, I can run tune2fs but that, in the past, has induced the disappearance of
the filesystem as well.

thanks,
JR


Cliff White wrote:
> jrs wrote:
>> Greetings,
>>
>> I've posted before but no one responded. I'm reposting because I'm
>> really dead in the water here until I can get this fixed.
>>
>> The issue is that my OSTs don't survive a reboot of the OSS.
>>
>> In the below I'm dealing with two OSTs, quad-core Intel Xeon machines
>> with 8Gigs memory and dual port Qlogic fiber channel card.  They both
>> run SLES 10.1 and lustre 1.6.4.3.  My two MDS (similiar, though not
>> exactly same hardware), don't have the same problem, though I'm only
>> accessing a single MDT from them.
>>
>> I've produced the problem by something as simple as running
>> umount /mnt/lustre/ost/ost_oss01_lustre0102_01
>> tune2fs -O +mmp /dev/mapper/ost_oss01_lustre0102_01
>> mount -t lustre /dev/mapper/ost_oss01_lustre0102_01 
>> /mnt/lustre/ost/ost_oss01_lustre0102_01
>>
> .....
> 
>> Any suggestions would be deeply appreciated.
> 
> It looks like something is really destroying your disks, if you try this 
> with ordinary ext3, does the filesystem survive a reboot?
> 
> Otherwise, you could try:
> - mkfs.lustre as before.
> # tunefs.lustre --print <device>
> reboot
> # tunefs.lustre --print <device>
> 
> Tunefs with --print is read-only, if it doesn't work the second time, 
> you should be able to compare the results.
> cliffw
> 
>>
>>
>> Thanks much,
>> JR Smith
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 



More information about the lustre-discuss mailing list