[Lustre-discuss] Lustre installation and configuration problems

Carlos Santana neubyr at gmail.com
Wed Jun 17 17:10:07 PDT 2009


Folks,

It been unsuccessful till now..

I made a fresh CentOS 5.2 minimum install (2.6.18-92.el5). Later, I
updated kernel to 2.6.18-92.1.17 version. Here is a output from uname
and rpm query:

[root at localhost ~]# rpm -qa | grep lustre
lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
[root at localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-92.1.17.el5 #1 SMP Tue Nov 4
13:45:01 EST 2008 i686 i686 i386 GNU/Linux

Other details:
--- --- ---
[root at localhost ~]# ls -l /lib/modules | grep 2.6
drwxr-xr-x 6 root root 4096 Jun 17 18:47 2.6.18-92.1.17.el5
drwxr-xr-x 6 root root 4096 Jun 17 17:38 2.6.18-92.el5


[root at localhost modules]# find . | grep lustre
./2.6.18-92.1.17.el5/kernel/net/lustre
./2.6.18-92.1.17.el5/kernel/net/lustre/libcfs.ko
./2.6.18-92.1.17.el5/kernel/net/lustre/lnet.ko
./2.6.18-92.1.17.el5/kernel/net/lustre/ksocklnd.ko
./2.6.18-92.1.17.el5/kernel/net/lustre/ko2iblnd.ko
./2.6.18-92.1.17.el5/kernel/net/lustre/lnet_selftest.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre
./2.6.18-92.1.17.el5/kernel/fs/lustre/osc.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/ptlrpc.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/obdecho.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/lvfs.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/mgc.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/llite_lloop.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/lov.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/mdc.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/lquota.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/lustre.ko
./2.6.18-92.1.17.el5/kernel/fs/lustre/obdclass.ko
--- --- ---


I am still having same problem. I seriously doubt, am I missing anything?
I also tried a source install for 'patchless client', however I have
been consistent in its results too.

Are there any configuration steps needed after rpm (or source)
installation? The one that I know of is restricting interfaces in
modeprobe.conf, however I have tried it on-n-off with no success.
Could anyone please suggest any debugging and tests for the same? How
can I provide you more valuable output to help me? Any insights?

Also, I have a suggestion here. It might be good idea to check for
'uname -r' check in RPM installation to check for matching kernel
version and if not suggest for source install.

Thanks for the help. I really appreciate your patience..

-
Thanks,
CS.


On Wed, Jun 17, 2009 at 10:40 AM, Jerome, Ron<Ron.Jerome at nrc-cnrc.gc.ca> wrote:
> I think the problem you have, as Cliff alluded to, is a mismatch between
> your kernel version  and the Luster kernel version modules.
>
>
>
> You have kernel “2.6.18-92.el5” and are installing Lustre
> “2.6.18_92.1.17.el5”   Note the “.1.17” is significant as the modules will
> end up in the wrong directory.  There is an update to CentOS to bring the
> kernel to the matching 2.6.18_92.1.17.el5 version you can pull it off the
> CentOS mirror site in the updates directory.
>
>
>
>
>
> Ron.
>
>
>
> From: lustre-discuss-bounces at lists.lustre.org
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Carlos Santana
> Sent: June 17, 2009 11:21 AM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Lustre installation and configuration problems
>
>
>
> And is there any specific installation order for patchless client? Could
> someone please share it with me?
>
> -
> CS.
>
> On Wed, Jun 17, 2009 at 10:18 AM, Carlos Santana <neubyr at gmail.com> wrote:
>
> Huh... :( Sorry to bug you guys again...
>
> I am planning to make a fresh start now as nothing seems to have worked for
> me. If you have any comments/feedback please share them.
>
> I would like to confirm installation order before I make a fresh start. From
> Arden's experience:
> http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html , the
> lusre-module is installed last. As I was installing Lustre 1.8, I was
> referring 1.8 operations manual
> http://manual.lustre.org/index.php?title=Main_Page . The installation order
> in the manual is different than what Arden has suggested.
>
> Will it make a difference in configuration at later stage? Which one should
> I follow now?
> Any comments?
>
> Thanks,
> CS.
>
>
>
> On Wed, Jun 17, 2009 at 12:35 AM, Carlos Santana <neubyr at gmail.com> wrote:
>
> Thanks Cliff.
>
> The depmod -a was successful before as well. I am using CentOS 5.2
> box. Following are the packages installed:
> [root at localhost tmp]# rpm -qa | grep -i lustre
> lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
>
> lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
>
> [root at localhost tmp]# uname -a
>
> Linux localhost.localdomain 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47
> EDT 2008 i686 i686 i386 GNU/Linux
>
> And here is a output from strace for mount:
> http://www.heypasteit.com/clip/8WT
>
> Any further debugging hints?
>
> Thanks,
> CS.
>
> On 6/16/09, Cliff White <Cliff.White at sun.com> wrote:
>> Carlos Santana wrote:
>>> The '$ modprobe -l lustre*' did not show any module on a patchless
>>> client. modprobe -v returns 'FATAL: Module lustre not found'.
>>>
>>> How do I install a patchless client?
>>> I have tried lustre-client-modules and lustre-client-ver rpm packages in
>>> both sequences. Am I missing anything?
>>>
>>
>> Make sure the lustre-client-modules package matches your running kernel.
>> Run depmod -a to be sure
>> cliffw
>>
>>> Thanks,
>>> CS.
>>>
>>>
>>>
>>> On Tue, Jun 16, 2009 at 2:28 PM, Cliff White <Cliff.White at sun.com
>>> <mailto:Cliff.White at sun.com>> wrote:
>>>
>>>     Carlos Santana wrote:
>>>
>>>         The lctlt ping and 'net up' failed with the following messages:
>>>         --- ---
>>>         [root at localhost ~]# lctl ping 10.0.0.42
>>>         opening /dev/lnet failed: No such device
>>>         hint: the kernel modules may not be loaded
>>>         failed to ping 10.0.0.42 at tcp: No such device
>>>
>>>         [root at localhost ~]# lctl network up
>>>         opening /dev/lnet failed: No such device
>>>         hint: the kernel modules may not be loaded
>>>         LNET configure error 19: No such device
>>>
>>>
>>>     Make sure modules are unloaded, then try modprobe -v.
>>>     Looks like you have lnet mis-configured, if your module options are
>>>     wrong, you will see an error during the modprobe.
>>>     cliffw
>>>
>>>         --- ---
>>>
>>>
>>>         I tried lustre_rmmod and depmod commands and it did not return
>>>         any error messages. Any further clues? Reinstall patchless
>>>         client again?
>>>
>>>         -
>>>         CS.
>>>
>>>
>>>         On Tue, Jun 16, 2009 at 1:32 PM, Cliff White
>>>         <Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>         <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>> wrote:
>>>
>>>            Carlos Santana wrote:
>>>
>>>                I was able to run lustre_rmmod and depmod successfully.
>>> The
>>>                '$lctl list_nids' returned the server ip address and
>>>         interface
>>>                (tcp0).
>>>
>>>                I tried to mount the file system on a remote client, but
>>> it
>>>                failed with the following message.
>>>                --- ---
>>>                [root at localhost ~]# mount -t lustre 10.0.0.42 at tcp0:/lustre
>>>                /mnt/lustre
>>>                mount.lustre: mount 10.0.0.42 at tcp0:/lustre at /mnt/lustre
>>>                failed: No such device
>>>                Are the lustre modules loaded?
>>>                Check /etc/modprobe.conf and /proc/filesystems
>>>                Note 'alias lustre llite' should be removed from
>>>         modprobe.conf
>>>                --- ---
>>>
>>>                However, the mounting is successful on a single node
>>>                configuration - with client on the same machine as MDS
>>>         and OST.
>>>                Any clues? Where to look for logs and debug messages?
>>>
>>>
>>>            Syslog || /var/log/messages is the normal place.
>>>
>>>            You can use 'lctl ping' to verify that the client can reach
>>>         the server.
>>>            Usually in these cases, it's a network/name misconfiguration.
>>>
>>>            Run 'tunefs.lustre --print' on your servers, and verify that
>>>         mgsnode=
>>>            is correct.
>>>
>>>            cliffw
>>>
>>>
>>>                Thanks,
>>>                CS.
>>>
>>>
>>>
>>>
>>>
>>>                On Tue, Jun 16, 2009 at 12:16 PM, Cliff White
>>>                <Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>         <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>
>>>                <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>         <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>>>
>>> wrote:
>>>
>>>                   Carlos Santana wrote:
>>>
>>>                       Thanks Kevin..
>>>
>>>                   Please read:
>>>
>>>
>>>
>>> http://manual.lustre.org/manual/LustreManual16_HTML/ConfiguringLustre.html#50401328_pgfId-1289529
>>>
>>>                   Those instructions are identical for 1.6 and 1.8.
>>>
>>>                   For current lustre, only two commands are used for
>>>         configuration.
>>>                   mkfs.lustre and mount.
>>>
>>>
>>>                   Usually when lustre_rmmod returns that error, you run
>>>         it a second
>>>                   time, and it will clear things. Unless you have live
>>>         mounts or
>>>                   network connections.
>>>
>>>                   cliffw
>>>
>>>
>>>                       I am referring to 1.8 manual, but I was also
>>>         referring to
>>>                HowTo
>>>                       page on wiki which seems to be for 1.6. The HowTo
>>> page
>>>
>>>
>>>
>>> http://wiki.lustre.org/index.php/Lustre_Howto#Using_Supplied_Configuration_Tools
>>>                       mentions abt lmc, lconf, and lctl.
>>>
>>>                       The modules are installed in the right place. The
>>> '$
>>>                       lustre_rmmod' resulted in following o/p:
>>>                       [root at localhost
>>> 2.6.18-92.1.17.el5_lustre.1.8.0smp]#
>>>                lustre_rmmod
>>>                       ERROR: Module obdfilter is in use
>>>                       ERROR: Module ost is in use
>>>                       ERROR: Module mds is in use
>>>                       ERROR: Module fsfilt_ldiskfs is in use
>>>                       ERROR: Module mgs is in use
>>>                       ERROR: Module mgc is in use by mgs
>>>                       ERROR: Module ldiskfs is in use by fsfilt_ldiskfs
>>>                       ERROR: Module lov is in use
>>>                       ERROR: Module lquota is in use by obdfilter,mds
>>>                       ERROR: Module osc is in use
>>>                       ERROR: Module ksocklnd is in use
>>>                       ERROR: Module ptlrpc is in use by
>>>                       obdfilter,ost,mds,mgs,mgc,lov,lquota,osc
>>>                       ERROR: Module obdclass is in use by
>>>
>>>         obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ptlrpc
>>>                       ERROR: Module lnet is in use by
>>>         ksocklnd,ptlrpc,obdclass
>>>                       ERROR: Module lvfs is in use by
>>>
>>>
>>> obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ptlrpc,obdclass
>>>                       ERROR: Module libcfs is in use by
>>>
>>>
>>>
>>> obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ksocklnd,ptlrpc,obdclass,lnet,lvfs
>>>
>>>                       Do I need to shutdown these services? How can I do
>>>         that?
>>>
>>>                       Thanks,
>>>                       CS.
>>>



More information about the lustre-discuss mailing list