[Lustre-discuss] Fw: Re: Unable to activate OST

Dusty Marks dustynmarks at gmail.com
Fri Jan 15 14:42:43 PST 2010


I did some googling and i found the command lctl ping. So i went on the oss
and typed in "lctl ping 192.168.0.2 at tcp". This errored out with an I/O
error.

It is quite obvious that i've simply misconfigured the network. Could
someone explain how to properly configure it?

I don't understand what the entry in modprobe actually means, so i cannot
say what should be entered.

Each one of my machines has one NIC (eth0). What do i enter in
modprobe.conf? To make this work correctly? if i update the entry in
modprobe.conf, do i have to redo anything? or does lustre pickup on the
changes without restarting anything?

Thanks all for the help so far.

- Dusty

On Fri, Jan 15, 2010 at 10:36 AM, Dusty Marks <dustynmarks at gmail.com> wrote:

> I searched through the manual, and the only section i could find dealing
> with networking configuration is section 4.1.0.2 titled "Module Setup" in
> the Lustre 1.8 operations manual.
>
> It tells me to run the command modprobe -v lustre "networks=tcp0(eth0)",
> and i did such on the MDS, however it errored out with:
>
> [root at mds ~]# modprobe -v lustre "networks=tcp0(eth0)"
> insmod
> /lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko
> networks=tcp0(eth0)
> FATAL: Error inserting lustre
> (/lib/modules/2.6.18-128.7.1.el5_lustre.1.8.1.1.20091003130007/kernel/fs/lustre/lustre.ko):
> Unknown symbol in module, or unknown parameter (see dmesg)
>
> dmesg says nothing, but message says this:
> Jan 15 10:27:48 mds kernel: lustre: Unknown parameter `networks'
>
> I even tried adding "options lnet networks=tcp0(eth0)" however that didn't
> work either
>
> I'm terribly sorry for my incompetence, but i'm having a difficult time
> understanding lustre's abstractions.
>
> Each one of my nodes have a single ethernet card (eth0)
>
>
>
> On Thu, Jan 14, 2010 at 11:32 PM, Andreas Dilger <adilger at sun.com> wrote:
>
>>
>> On 2010-01-15, at 00:21, Arden Wiebe wrote:
>>
>>> Your mount command is wrong - try this format.
>>>
>>> mount -t lustre 192.168.0.7 at tcp0:/ioio /mnt/ioio
>>>
>>> So by substitution for supplied your mount line should
>>> read:
>>>
>>> mount -t datafs 192.168.0.2 at tcp0:/datafs /mnt/datafs
>>>
>>
>> No, that isn't correct.  You are showing the mount command for a
>> client.  It is the OST that is failing to mount, likely because
>> the network is not configured correctly, and the OST needs to
>> contact the MGS node always on the first mount in order to join
>> the filesystem.
>>
>>  Enjoy the required reading and testing.  I found by
>>> naming things uniquely helped me clarify what was actually
>>> required.  Try calling your filesystem "Dusty" or
>>> "Mark" and that should make things clearer for you.
>>>
>>>
>>> --- On Thu, 1/14/10, Andreas Dilger <adilger at sun.com> wrote:
>>>
>>>> On 2010-01-14, at 23:51, Dusty Marks wrote:
>>>>
>>>>> You are correct, there is information in messages.  Following are the
>>>>> entries related the lustre. The line that says 192.168.0.2 at tcp is
>>>>> unreachable makes sense, but what exactly is the problem? I entered
>>>>> the line "options lnet networks=tcp" in modprobe.conf on the oss and
>>>>> mds. The only difference was, i entered that line AFTER i setup
>>>>> lustre on the OSS. Could that be the problem? I don't see why that
>>>>> would be the problem, as the oss is trying to reach the MDS/MGS,
>>>>> which is 192.168.0.2.
>>>>>
>>>>> ---------------------------------------
>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(linux-tcpip.c:
>>>>> 688:libcfs_sock_connect()) Error -113 connecting 0.0.0.0/1023 ->
>>>>> 192.168.0.2/988
>>>>> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(acceptor.c:
>>>>> 95:lnet_connect_console_error()) Connection to 192.168.0.2 at tcp at
>>>>> host 192.168.0.2 was unreachable: the network or that node may be
>>>>> down, or Lustre may be misconfigured.
>>>>>
>>>>
>>>>
>>>> Please read the chapter in the manual about network configuration.  I
>>>> suspect the .0.2 network is not your eth0 network interface, and your
>>>> modprobe.conf needs to be fixed.
>>>>
>>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>>
>
>
> --
> The graduate with a Science degree asks, "Why does it work?" The graduate
> with an Engineering degree asks, "How does it work?" The graduate with an
> Accounting degree asks, "How much will it cost?" The graduate with an Arts
> degree asks, "Do you want fries with that?"
>



-- 
The graduate with a Science degree asks, "Why does it work?" The graduate
with an Engineering degree asks, "How does it work?" The graduate with an
Accounting degree asks, "How much will it cost?" The graduate with an Arts
degree asks, "Do you want fries with that?"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100115/889d4fc5/attachment.htm>


More information about the lustre-discuss mailing list