[Lustre-discuss] Fw: Re: Unable to activate OST

Dusty Marks dustynmarks at gmail.com
Fri Jan 15 16:15:27 PST 2010


the output of ltcl list_nids on the oss is

[root at oss ~]# lctl list_nids
192.168.0.3 at tcp

and from the mds

[root at mds ~]# lctl list_nids
192.168.0.2 at tcp

Thanks,
Dusty

On Fri, Jan 15, 2010 at 5:39 PM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:

> Hi,
>
> Could you please post output of the 'lctl list_nids' command on OSS
> system and on MDS system. This will show us which network was
> configured to work with lustre.
>
> Regarding entries in the modprobe.conf, they tell lnet module which
> NIC or multiple NICs will be configured to work with lustre. If your
> modprobe.conf doesn't have lnet options line,  by default Lustre will
> configure the first NIC which is usually eth0.
> Below is a modprobe.conf entry from my lustre setup.
> My OSS(s) and MDS(s) have 2 NICs eth0 and eth1 and an Infiniband NIC
> ib0. The IB is set to work as IPoIB so lustre treats it as an ordinary
> Ethernet NIC
> options lnet networks=tcp0(ib0),tcp1(eth1),tcp2(eth1:0)
> So the line above means that:
>   first lustre network tcp0 is configured on interface ib0
>   second lustre network tcp1 is configured on interface eth1
>   third lustre network tcp2 is confiured on alias interface eth1:0
>
> eth0 is not mentioned on this line because I have chosen not to
> configure it to work with lustre.
>
>
> Once lnet module is loaded you can check which network or networks are
> configured to work with Lustre using 'lctl list_nids' command
>
> Cheers
>
> Wojciech
> 2010/1/15 Dusty Marks <dustynmarks at gmail.com>:
> > I did some googling and i found the command lctl ping. So i went on the
> oss
> > and typed in "lctl ping 192.168.0.2 at tcp". This errored out with an I/O
> > error.
> >
> > It is quite obvious that i've simply misconfigured the network. Could
> > someone explain how to properly configure it?
> >
> > I don't understand what the entry in modprobe actually means, so i cannot
> > say what should be entered.
> >
> > Each one of my machines has one NIC (eth0). What do i enter in
> > modprobe.conf? To make this work correctly? if i update the entry in
> > modprobe.conf, do i have to redo anything? or does lustre pickup on the
> > changes without restarting anything?
> >
> > Thanks all for the help so far.
> >
> > - Dusty
> >
> > On Fri, Jan 15, 2010 at 10:36 AM, Dusty Marks <dustynmarks at gmail.com>
> wrote:
> >>
> >> I searched through the manual, and the only section i could find dealing
> >> with networking configuration is section 4.1.0.2 titled "Module Setup"
> in
> >> the Lustre 1.8 operations manual.
> >>
> >> It tells me to run the command modprobe -v lustre "networks=tcp0(eth0)",
> >> and i did such on the MDS, however it errored out with:
> >>
> >> [root at mds ~]# modprobe -v lustre "networks=tcp0(eth0)"
> >> insmod
> >>
> /lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko
> >> networks=tcp0(eth0)
> >> FATAL: Error inserting lustre
> >>
> (/lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko):
> >> Unknown symbol in module, or unknown parameter (see dmesg)
> >>
> >> dmesg says nothing, but message says this:
> >> Jan 15 10:27:48 mds kernel: lustre: Unknown parameter `networks'
> >>
> >> I even tried adding "options lnet networks=tcp0(eth0)" however that
> didn't
> >> work either
> >>
> >> I'm terribly sorry for my incompetence, but i'm having a difficult time
> >> understanding lustre's abstractions.
> >>
> >> Each one of my nodes have a single ethernet card (eth0)
> >>
> >>
> >> On Thu, Jan 14, 2010 at 11:32 PM, Andreas Dilger <adilger at sun.com>
> wrote:
> >>>
> >>> On 2010-01-15, at 00:21, Arden Wiebe wrote:
> >>>>
> >>>> Your mount command is wrong - try this format.
> >>>>
> >>>> mount -t lustre 192.168.0.7 at tcp0:/ioio /mnt/ioio
> >>>>
> >>>> So by substitution for supplied your mount line should
> >>>> read:
> >>>>
> >>>> mount -t datafs 192.168.0.2 at tcp0:/datafs /mnt/datafs
> >>>
> >>> No, that isn't correct.  You are showing the mount command for a
> >>> client.  It is the OST that is failing to mount, likely because
> >>> the network is not configured correctly, and the OST needs to
> >>> contact the MGS node always on the first mount in order to join
> >>> the filesystem.
> >>>
> >>>> Enjoy the required reading and testing.  I found by
> >>>> naming things uniquely helped me clarify what was actually
> >>>> required.  Try calling your filesystem "Dusty" or
> >>>> "Mark" and that should make things clearer for you.
> >>>>
> >>>> --- On Thu, 1/14/10, Andreas Dilger <adilger at sun.com> wrote:
> >>>>>
> >>>>> On 2010-01-14, at 23:51, Dusty Marks wrote:
> >>>>>>
> >>>>>> You are correct, there is information in messages.  Following are
> the
> >>>>>> entries related the lustre. The line that says 192.168.0.2 at tcp is
> >>>>>> unreachable makes sense, but what exactly is the problem? I entered
> >>>>>> the line "options lnet networks=tcp" in modprobe.conf on the oss and
> >>>>>> mds. The only difference was, i entered that line AFTER i setup
> >>>>>> lustre on the OSS. Could that be the problem? I don't see why that
> >>>>>> would be the problem, as the oss is trying to reach the MDS/MGS,
> >>>>>> which is 192.168.0.2.
> >>>>>>
> >>>>>> ---------------------------------------
> >>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(linux-tcpip.c:
> >>>>>> 688:libcfs_sock_connect()) Error -113 connecting 0.0.0.0/1023 ->
> >>>>>> 192.168.0.2/988
> >>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(acceptor.c:
> >>>>>> 95:lnet_connect_console_error()) Connection to 192.168.0.2 at tcp at
> >>>>>> host 192.168.0.2 was unreachable: the network or that node may be
> >>>>>> down, or Lustre may be misconfigured.
> >>>>>
> >>>>>
> >>>>> Please read the chapter in the manual about network configuration.  I
> >>>>> suspect the .0.2 network is not your eth0 network interface, and your
> >>>>> modprobe.conf needs to be fixed.
> >>>
> >>>
> >>> Cheers, Andreas
> >>> --
> >>> Andreas Dilger
> >>> Sr. Staff Engineer, Lustre Group
> >>> Sun Microsystems of Canada, Inc.
> >>>
> >>
> >>
> >>
> >> --
> >> The graduate with a Science degree asks, "Why does it work?" The
> graduate
> >> with an Engineering degree asks, "How does it work?" The graduate with
> an
> >> Accounting degree asks, "How much will it cost?" The graduate with an
> Arts
> >> degree asks, "Do you want fries with that?"
> >
> >
> >
> > --
> > The graduate with a Science degree asks, "Why does it work?" The graduate
> > with an Engineering degree asks, "How does it work?" The graduate with an
> > Accounting degree asks, "How much will it cost?" The graduate with an
> Arts
> > degree asks, "Do you want fries with that?"
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
>
>
>
> --
> --
> Wojciech Turek
>
> Assistant System Manager
>
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517
>



-- 
The graduate with a Science degree asks, "Why does it work?" The graduate
with an Engineering degree asks, "How does it work?" The graduate with an
Accounting degree asks, "How much will it cost?" The graduate with an Arts
degree asks, "Do you want fries with that?"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100115/a62074d6/attachment.htm>


More information about the lustre-discuss mailing list