[Lustre-discuss] Fw: Re: Unable to activate OST

Wojciech Turek wjt27 at cam.ac.uk
Fri Jan 15 17:01:02 PST 2010


Can you check if you can ping MDS and OSS using normal ping command?


2010/1/16 Dusty Marks <dustynmarks at gmail.com>:
> the output of ltcl list_nids on the oss is
>
> [root at oss ~]# lctl list_nids
> 192.168.0.3 at tcp
>
> and from the mds
>
> [root at mds ~]# lctl list_nids
> 192.168.0.2 at tcp
>
> Thanks,
> Dusty
>
> On Fri, Jan 15, 2010 at 5:39 PM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
>>
>> Hi,
>>
>> Could you please post output of the 'lctl list_nids' command on OSS
>> system and on MDS system. This will show us which network was
>> configured to work with lustre.
>>
>> Regarding entries in the modprobe.conf, they tell lnet module which
>> NIC or multiple NICs will be configured to work with lustre. If your
>> modprobe.conf doesn't have lnet options line,  by default Lustre will
>> configure the first NIC which is usually eth0.
>> Below is a modprobe.conf entry from my lustre setup.
>> My OSS(s) and MDS(s) have 2 NICs eth0 and eth1 and an Infiniband NIC
>> ib0. The IB is set to work as IPoIB so lustre treats it as an ordinary
>> Ethernet NIC
>> options lnet networks=tcp0(ib0),tcp1(eth1),tcp2(eth1:0)
>> So the line above means that:
>>   first lustre network tcp0 is configured on interface ib0
>>   second lustre network tcp1 is configured on interface eth1
>>   third lustre network tcp2 is confiured on alias interface eth1:0
>>
>> eth0 is not mentioned on this line because I have chosen not to
>> configure it to work with lustre.
>>
>>
>> Once lnet module is loaded you can check which network or networks are
>> configured to work with Lustre using 'lctl list_nids' command
>>
>> Cheers
>>
>> Wojciech
>> 2010/1/15 Dusty Marks <dustynmarks at gmail.com>:
>> > I did some googling and i found the command lctl ping. So i went on the
>> > oss
>> > and typed in "lctl ping 192.168.0.2 at tcp". This errored out with an I/O
>> > error.
>> >
>> > It is quite obvious that i've simply misconfigured the network. Could
>> > someone explain how to properly configure it?
>> >
>> > I don't understand what the entry in modprobe actually means, so i
>> > cannot
>> > say what should be entered.
>> >
>> > Each one of my machines has one NIC (eth0). What do i enter in
>> > modprobe.conf? To make this work correctly? if i update the entry in
>> > modprobe.conf, do i have to redo anything? or does lustre pickup on the
>> > changes without restarting anything?
>> >
>> > Thanks all for the help so far.
>> >
>> > - Dusty
>> >
>> > On Fri, Jan 15, 2010 at 10:36 AM, Dusty Marks <dustynmarks at gmail.com>
>> > wrote:
>> >>
>> >> I searched through the manual, and the only section i could find
>> >> dealing
>> >> with networking configuration is section 4.1.0.2 titled "Module Setup"
>> >> in
>> >> the Lustre 1.8 operations manual.
>> >>
>> >> It tells me to run the command modprobe -v lustre
>> >> "networks=tcp0(eth0)",
>> >> and i did such on the MDS, however it errored out with:
>> >>
>> >> [root at mds ~]# modprobe -v lustre "networks=tcp0(eth0)"
>> >> insmod
>> >>
>> >> /lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko
>> >> networks=tcp0(eth0)
>> >> FATAL: Error inserting lustre
>> >>
>> >> (/lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko):
>> >> Unknown symbol in module, or unknown parameter (see dmesg)
>> >>
>> >> dmesg says nothing, but message says this:
>> >> Jan 15 10:27:48 mds kernel: lustre: Unknown parameter `networks'
>> >>
>> >> I even tried adding "options lnet networks=tcp0(eth0)" however that
>> >> didn't
>> >> work either
>> >>
>> >> I'm terribly sorry for my incompetence, but i'm having a difficult time
>> >> understanding lustre's abstractions.
>> >>
>> >> Each one of my nodes have a single ethernet card (eth0)
>> >>
>> >>
>> >> On Thu, Jan 14, 2010 at 11:32 PM, Andreas Dilger <adilger at sun.com>
>> >> wrote:
>> >>>
>> >>> On 2010-01-15, at 00:21, Arden Wiebe wrote:
>> >>>>
>> >>>> Your mount command is wrong - try this format.
>> >>>>
>> >>>> mount -t lustre 192.168.0.7 at tcp0:/ioio /mnt/ioio
>> >>>>
>> >>>> So by substitution for supplied your mount line should
>> >>>> read:
>> >>>>
>> >>>> mount -t datafs 192.168.0.2 at tcp0:/datafs /mnt/datafs
>> >>>
>> >>> No, that isn't correct.  You are showing the mount command for a
>> >>> client.  It is the OST that is failing to mount, likely because
>> >>> the network is not configured correctly, and the OST needs to
>> >>> contact the MGS node always on the first mount in order to join
>> >>> the filesystem.
>> >>>
>> >>>> Enjoy the required reading and testing.  I found by
>> >>>> naming things uniquely helped me clarify what was actually
>> >>>> required.  Try calling your filesystem "Dusty" or
>> >>>> "Mark" and that should make things clearer for you.
>> >>>>
>> >>>> --- On Thu, 1/14/10, Andreas Dilger <adilger at sun.com> wrote:
>> >>>>>
>> >>>>> On 2010-01-14, at 23:51, Dusty Marks wrote:
>> >>>>>>
>> >>>>>> You are correct, there is information in messages.  Following are
>> >>>>>> the
>> >>>>>> entries related the lustre. The line that says 192.168.0.2 at tcp is
>> >>>>>> unreachable makes sense, but what exactly is the problem? I entered
>> >>>>>> the line "options lnet networks=tcp" in modprobe.conf on the oss
>> >>>>>> and
>> >>>>>> mds. The only difference was, i entered that line AFTER i setup
>> >>>>>> lustre on the OSS. Could that be the problem? I don't see why that
>> >>>>>> would be the problem, as the oss is trying to reach the MDS/MGS,
>> >>>>>> which is 192.168.0.2.
>> >>>>>>
>> >>>>>> ---------------------------------------
>> >>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(linux-tcpip.c:
>> >>>>>> 688:libcfs_sock_connect()) Error -113 connecting 0.0.0.0/1023 ->
>> >>>>>> 192.168.0.2/988
>> >>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(acceptor.c:
>> >>>>>> 95:lnet_connect_console_error()) Connection to 192.168.0.2 at tcp at
>> >>>>>> host 192.168.0.2 was unreachable: the network or that node may be
>> >>>>>> down, or Lustre may be misconfigured.
>> >>>>>
>> >>>>>
>> >>>>> Please read the chapter in the manual about network configuration.
>> >>>>>  I
>> >>>>> suspect the .0.2 network is not your eth0 network interface, and
>> >>>>> your
>> >>>>> modprobe.conf needs to be fixed.
>> >>>
>> >>>
>> >>> Cheers, Andreas
>> >>> --
>> >>> Andreas Dilger
>> >>> Sr. Staff Engineer, Lustre Group
>> >>> Sun Microsystems of Canada, Inc.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> The graduate with a Science degree asks, "Why does it work?" The
>> >> graduate
>> >> with an Engineering degree asks, "How does it work?" The graduate with
>> >> an
>> >> Accounting degree asks, "How much will it cost?" The graduate with an
>> >> Arts
>> >> degree asks, "Do you want fries with that?"
>> >
>> >
>> >
>> > --
>> > The graduate with a Science degree asks, "Why does it work?" The
>> > graduate
>> > with an Engineering degree asks, "How does it work?" The graduate with
>> > an
>> > Accounting degree asks, "How much will it cost?" The graduate with an
>> > Arts
>> > degree asks, "Do you want fries with that?"
>> >
>> > _______________________________________________
>> > Lustre-discuss mailing list
>> > Lustre-discuss at lists.lustre.org
>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> >
>> >
>>
>>
>>
>> --
>> --
>> Wojciech Turek
>>
>> Assistant System Manager
>>
>> High Performance Computing Service
>> University of Cambridge
>> Email: wjt27 at cam.ac.uk
>> Tel: (+)44 1223 763517
>
>
>
> --
> The graduate with a Science degree asks, "Why does it work?" The graduate
> with an Engineering degree asks, "How does it work?" The graduate with an
> Accounting degree asks, "How much will it cost?" The graduate with an Arts
> degree asks, "Do you want fries with that?"
>



-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517



More information about the lustre-discuss mailing list