[Lustre-discuss] Lustre installation and configuration problems
Cliff White
Cliff.White at Sun.COM
Wed Jun 17 17:27:32 PDT 2009
Carlos Santana wrote:
> Folks,
>
> It been unsuccessful till now..
>
> I made a fresh CentOS 5.2 minimum install (2.6.18-92.el5). Later, I
> updated kernel to 2.6.18-92.1.17 version. Here is a output from uname
> and rpm query:
>
> [root at localhost ~]# rpm -qa | grep lustre
> lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
> [root at localhost ~]# uname -a
> Linux localhost.localdomain 2.6.18-92.1.17.el5 #1 SMP Tue Nov 4
> 13:45:01 EST 2008 i686 i686 i386 GNU/Linux
I think you are missing a basic point here. It's been mentioned a few times.
You don't have a lustre-patched kernel installed.
Here's what a proper system looks like - it's 1.6.7.2, but that doesn't
matter, 1.8.0 is the same.
# rpm -qa |grep lustre
lustre-1.6.7-2.6.18_92.1.17.el5_lustre.1.6.7smp
kernel-lustre-smp-2.6.18-92.1.17.el5_lustre.1.6.7
lustre-ldiskfs-3.0.7-2.6.18_92.1.17.el5_lustre.1.6.7smp
lustre-modules-1.6.7-2.6.18_92.1.17.el5_lustre.1.6.7smp
# uname -a
Linux bun2 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Tue Feb 24 19:59:12
MST 2009 i686 i686 i386 GNU/Linux
Notice the difference? Two additional RPMS, and the version strings of
modules and kernel match _exactly_.
cliffw
>
> Other details:
> --- --- ---
> [root at localhost ~]# ls -l /lib/modules | grep 2.6
> drwxr-xr-x 6 root root 4096 Jun 17 18:47 2.6.18-92.1.17.el5
> drwxr-xr-x 6 root root 4096 Jun 17 17:38 2.6.18-92.el5
>
>
> [root at localhost modules]# find . | grep lustre
> ./2.6.18-92.1.17.el5/kernel/net/lustre
> ./2.6.18-92.1.17.el5/kernel/net/lustre/libcfs.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/lnet.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/ksocklnd.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/ko2iblnd.ko
> ./2.6.18-92.1.17.el5/kernel/net/lustre/lnet_selftest.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/osc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/ptlrpc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/obdecho.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lvfs.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/mgc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/llite_lloop.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lov.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/mdc.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lquota.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/lustre.ko
> ./2.6.18-92.1.17.el5/kernel/fs/lustre/obdclass.ko
> --- --- ---
>
>
> I am still having same problem. I seriously doubt, am I missing anything?
> I also tried a source install for 'patchless client', however I have
> been consistent in its results too.
>
> Are there any configuration steps needed after rpm (or source)
> installation? The one that I know of is restricting interfaces in
> modeprobe.conf, however I have tried it on-n-off with no success.
> Could anyone please suggest any debugging and tests for the same? How
> can I provide you more valuable output to help me? Any insights?
>
> Also, I have a suggestion here. It might be good idea to check for
> 'uname -r' check in RPM installation to check for matching kernel
> version and if not suggest for source install.
>
> Thanks for the help. I really appreciate your patience..
>
> -
> Thanks,
> CS.
>
>
> On Wed, Jun 17, 2009 at 10:40 AM, Jerome, Ron<Ron.Jerome at nrc-cnrc.gc.ca> wrote:
>> I think the problem you have, as Cliff alluded to, is a mismatch between
>> your kernel version and the Luster kernel version modules.
>>
>>
>>
>> You have kernel “2.6.18-92.el5” and are installing Lustre
>> “2.6.18_92.1.17.el5” Note the “.1.17” is significant as the modules will
>> end up in the wrong directory. There is an update to CentOS to bring the
>> kernel to the matching 2.6.18_92.1.17.el5 version you can pull it off the
>> CentOS mirror site in the updates directory.
>>
>>
>>
>>
>>
>> Ron.
>>
>>
>>
>> From: lustre-discuss-bounces at lists.lustre.org
>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Carlos Santana
>> Sent: June 17, 2009 11:21 AM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] Lustre installation and configuration problems
>>
>>
>>
>> And is there any specific installation order for patchless client? Could
>> someone please share it with me?
>>
>> -
>> CS.
>>
>> On Wed, Jun 17, 2009 at 10:18 AM, Carlos Santana <neubyr at gmail.com> wrote:
>>
>> Huh... :( Sorry to bug you guys again...
>>
>> I am planning to make a fresh start now as nothing seems to have worked for
>> me. If you have any comments/feedback please share them.
>>
>> I would like to confirm installation order before I make a fresh start. From
>> Arden's experience:
>> http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html , the
>> lusre-module is installed last. As I was installing Lustre 1.8, I was
>> referring 1.8 operations manual
>> http://manual.lustre.org/index.php?title=Main_Page . The installation order
>> in the manual is different than what Arden has suggested.
>>
>> Will it make a difference in configuration at later stage? Which one should
>> I follow now?
>> Any comments?
>>
>> Thanks,
>> CS.
>>
>>
>>
>> On Wed, Jun 17, 2009 at 12:35 AM, Carlos Santana <neubyr at gmail.com> wrote:
>>
>> Thanks Cliff.
>>
>> The depmod -a was successful before as well. I am using CentOS 5.2
>> box. Following are the packages installed:
>> [root at localhost tmp]# rpm -qa | grep -i lustre
>> lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
>>
>> lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp
>>
>> [root at localhost tmp]# uname -a
>>
>> Linux localhost.localdomain 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47
>> EDT 2008 i686 i686 i386 GNU/Linux
>>
>> And here is a output from strace for mount:
>> http://www.heypasteit.com/clip/8WT
>>
>> Any further debugging hints?
>>
>> Thanks,
>> CS.
>>
>> On 6/16/09, Cliff White <Cliff.White at sun.com> wrote:
>>> Carlos Santana wrote:
>>>> The '$ modprobe -l lustre*' did not show any module on a patchless
>>>> client. modprobe -v returns 'FATAL: Module lustre not found'.
>>>>
>>>> How do I install a patchless client?
>>>> I have tried lustre-client-modules and lustre-client-ver rpm packages in
>>>> both sequences. Am I missing anything?
>>>>
>>> Make sure the lustre-client-modules package matches your running kernel.
>>> Run depmod -a to be sure
>>> cliffw
>>>
>>>> Thanks,
>>>> CS.
>>>>
>>>>
>>>>
>>>> On Tue, Jun 16, 2009 at 2:28 PM, Cliff White <Cliff.White at sun.com
>>>> <mailto:Cliff.White at sun.com>> wrote:
>>>>
>>>> Carlos Santana wrote:
>>>>
>>>> The lctlt ping and 'net up' failed with the following messages:
>>>> --- ---
>>>> [root at localhost ~]# lctl ping 10.0.0.42
>>>> opening /dev/lnet failed: No such device
>>>> hint: the kernel modules may not be loaded
>>>> failed to ping 10.0.0.42 at tcp: No such device
>>>>
>>>> [root at localhost ~]# lctl network up
>>>> opening /dev/lnet failed: No such device
>>>> hint: the kernel modules may not be loaded
>>>> LNET configure error 19: No such device
>>>>
>>>>
>>>> Make sure modules are unloaded, then try modprobe -v.
>>>> Looks like you have lnet mis-configured, if your module options are
>>>> wrong, you will see an error during the modprobe.
>>>> cliffw
>>>>
>>>> --- ---
>>>>
>>>>
>>>> I tried lustre_rmmod and depmod commands and it did not return
>>>> any error messages. Any further clues? Reinstall patchless
>>>> client again?
>>>>
>>>> -
>>>> CS.
>>>>
>>>>
>>>> On Tue, Jun 16, 2009 at 1:32 PM, Cliff White
>>>> <Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>> <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>> wrote:
>>>>
>>>> Carlos Santana wrote:
>>>>
>>>> I was able to run lustre_rmmod and depmod successfully.
>>>> The
>>>> '$lctl list_nids' returned the server ip address and
>>>> interface
>>>> (tcp0).
>>>>
>>>> I tried to mount the file system on a remote client, but
>>>> it
>>>> failed with the following message.
>>>> --- ---
>>>> [root at localhost ~]# mount -t lustre 10.0.0.42 at tcp0:/lustre
>>>> /mnt/lustre
>>>> mount.lustre: mount 10.0.0.42 at tcp0:/lustre at /mnt/lustre
>>>> failed: No such device
>>>> Are the lustre modules loaded?
>>>> Check /etc/modprobe.conf and /proc/filesystems
>>>> Note 'alias lustre llite' should be removed from
>>>> modprobe.conf
>>>> --- ---
>>>>
>>>> However, the mounting is successful on a single node
>>>> configuration - with client on the same machine as MDS
>>>> and OST.
>>>> Any clues? Where to look for logs and debug messages?
>>>>
>>>>
>>>> Syslog || /var/log/messages is the normal place.
>>>>
>>>> You can use 'lctl ping' to verify that the client can reach
>>>> the server.
>>>> Usually in these cases, it's a network/name misconfiguration.
>>>>
>>>> Run 'tunefs.lustre --print' on your servers, and verify that
>>>> mgsnode=
>>>> is correct.
>>>>
>>>> cliffw
>>>>
>>>>
>>>> Thanks,
>>>> CS.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 16, 2009 at 12:16 PM, Cliff White
>>>> <Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>> <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>
>>>> <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>
>>>> <mailto:Cliff.White at sun.com <mailto:Cliff.White at sun.com>>>>
>>>> wrote:
>>>>
>>>> Carlos Santana wrote:
>>>>
>>>> Thanks Kevin..
>>>>
>>>> Please read:
>>>>
>>>>
>>>>
>>>> http://manual.lustre.org/manual/LustreManual16_HTML/ConfiguringLustre.html#50401328_pgfId-1289529
>>>>
>>>> Those instructions are identical for 1.6 and 1.8.
>>>>
>>>> For current lustre, only two commands are used for
>>>> configuration.
>>>> mkfs.lustre and mount.
>>>>
>>>>
>>>> Usually when lustre_rmmod returns that error, you run
>>>> it a second
>>>> time, and it will clear things. Unless you have live
>>>> mounts or
>>>> network connections.
>>>>
>>>> cliffw
>>>>
>>>>
>>>> I am referring to 1.8 manual, but I was also
>>>> referring to
>>>> HowTo
>>>> page on wiki which seems to be for 1.6. The HowTo
>>>> page
>>>>
>>>>
>>>>
>>>> http://wiki.lustre.org/index.php/Lustre_Howto#Using_Supplied_Configuration_Tools
>>>> mentions abt lmc, lconf, and lctl.
>>>>
>>>> The modules are installed in the right place. The
>>>> '$
>>>> lustre_rmmod' resulted in following o/p:
>>>> [root at localhost
>>>> 2.6.18-92.1.17.el5_lustre.1.8.0smp]#
>>>> lustre_rmmod
>>>> ERROR: Module obdfilter is in use
>>>> ERROR: Module ost is in use
>>>> ERROR: Module mds is in use
>>>> ERROR: Module fsfilt_ldiskfs is in use
>>>> ERROR: Module mgs is in use
>>>> ERROR: Module mgc is in use by mgs
>>>> ERROR: Module ldiskfs is in use by fsfilt_ldiskfs
>>>> ERROR: Module lov is in use
>>>> ERROR: Module lquota is in use by obdfilter,mds
>>>> ERROR: Module osc is in use
>>>> ERROR: Module ksocklnd is in use
>>>> ERROR: Module ptlrpc is in use by
>>>> obdfilter,ost,mds,mgs,mgc,lov,lquota,osc
>>>> ERROR: Module obdclass is in use by
>>>>
>>>> obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ptlrpc
>>>> ERROR: Module lnet is in use by
>>>> ksocklnd,ptlrpc,obdclass
>>>> ERROR: Module lvfs is in use by
>>>>
>>>>
>>>> obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ptlrpc,obdclass
>>>> ERROR: Module libcfs is in use by
>>>>
>>>>
>>>>
>>>> obdfilter,ost,mds,fsfilt_ldiskfs,mgs,mgc,lov,lquota,osc,ksocklnd,ptlrpc,obdclass,lnet,lvfs
>>>>
>>>> Do I need to shutdown these services? How can I do
>>>> that?
>>>>
>>>> Thanks,
>>>> CS.
>>>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list