[Lustre-discuss] Kernel panic on mounting an OST
Wojciech Turek
wjt27 at cam.ac.uk
Thu Dec 13 08:22:15 PST 2007
Hi,
Sorry for big delay in responses but I was away for christmas lunch.
Changing index may help but I am not certain of that. Definitly it is
weird that you don't have OST0000 but you have OST0030 this may
cause problems later with quotas when you try to turn them off. I
think lustre will expect OST0000 to exist and if it don't find it,
lustre will complain and quota will not work.
However if you do change indexes I think you need to do that in
certain way I suggest do it as follows
# Umount all OST's and MDT's and run for each target:
tunefs.lustre --reformat --index=<index> --writeconf /dev/
<block_device_name>
# This need to be done on all OSS's and on MDS
# For each target mount it as ldiskfs file system. This need to be
done on all OSS's and on MDS
#for example:
mount -t ldiskfs /dev/dm-0 /mnt/mdt
# delete file /mnt/mdt/last_rcvd
# mount filesystem
after that you can do writeconf for each target, then start MGS/MDT
target, and then start one by one OST targets starting with mpath0 as
first one
Also have a look at our /etc/multipath.conf
As you can see it is very static but we can be sure that each dm-
<number> device is always pointing to the same LUN
defaults {
udev_dir /dev
polling_interval 10
selector "round-robin 0"
path_grouping_policy failover
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout /bin/true
path_checker tur
rr_min_io 100
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_name yes
prio_callout "/sbin/mpath_prio_my %n"
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
}
multipaths {
multipath {
wwid
360001ff007e6173300000800001d1c17
alias dm-0
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173401000800001d1c17
alias dm-1
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173502000800001d1c17
alias dm-2
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173906000800001d1c17
alias dm-3
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173a07000800001d1c17
alias dm-4
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173b08000800001d1c17
alias dm-5
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173603000800001d1c17
alias dm-6
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173704000800001d1c17
alias dm-7
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173805000800001d1c17
alias dm-8
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173c09000800001d1c17
alias dm-9
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173d0a000800001d1c17
alias dm-10
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
multipath {
wwid
360001ff007e6173e0b000800001d1c17
alias dm-11
path_grouping_policy failover
path_checker tur
path_selector "round-robin 0"
failback
immediate
rr_weight priorities
no_path_retry 5
}
}
I hope this helps
Wojciech
On 13 Dec 2007, at 15:18, Ludovic Francois wrote:
> On Dec 13, 3:12 pm, Ludovic Francois <lfranc... at gmail.com> wrote:
>> On Dec 13, 2:59 pm, "Ludovic Francois" <lfranc... at gmail.com> wrote:
>>
>>> Do you think it's possible someone overwrote the "label" with a
>>> tunefs command?
>>
>> or the system
>>
>>> I already saw it with some other file system.
>
> We recreated Target and Index with the tunefs.lustre command:
>
> --8<---------------cut here---------------start------------->8---
> [root at oss01 ~]# tunefs.lustre --writeconf --index 0 /dev/mpath/mpath0
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
> Read previous values:
> Target: lustre-OST0030
> Index: 48
> Lustre FS: lustre
> Mount type: ldiskfs
> Flags: 0x2
> (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.143.0.5 at tcp mgsnode=10.143.0.6 at tcp
> failover.node=10.143.0.2 at tcp sys.timeout=80 mgsnode=10.143.0.5 at tcp
> mgsnode=10.143.0.6 at tcp failover.node=10.143.0.2 at tcp sys.timeout=80
>
>
> Permanent disk data:
> Target: lustre-OST0000
> Index: 0
> Lustre FS: lustre
> Mount type: ldiskfs
> Flags: 0x102
> (OST writeconf )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.143.0.5 at tcp mgsnode=10.143.0.6 at tcp
> failover.node=10.143.0.2 at tcp sys.timeout=80 mgsnode=10.143.0.5 at tcp
> mgsnode=10.143.0.6 at tcp failover.node=10.143.0.2 at tcp sys.timeout=80
>
> Writing CONFIGS/mountdata
> [root at oss01 ~]#
> --8<---------------cut here---------------end--------------->8---
>
> But now we have some problems to remount the file system, could you
> confirm us this command just rewrite the index?
>
> Best Regards, Ludo
>
> --
> Ludovic Francois +33 (0)6 14 77 26 93
> System Engineer DataDirect Networks
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071213/ebd8b10b/attachment.htm>
More information about the lustre-discuss
mailing list