[Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

Abhay Dandekar dandekar.abhay at gmail.com
Wed Aug 20 10:19:15 PDT 2014


Hi Andreas,

Sorry for bothering you, but modifying the /etc/hosts still does not solve
the problem.

Just to give some more info, I am trying to setup a virtual cluster of
lustre nodes.

Here is my /etc/hosts

[root at mgs-new-test ~]# cat /etc/hosts
192.168.122.50        mgs-new-test
192.168.122.50       localhost localhost.localdomain localhost4
localhost4.localdomain4
::1                 localhost localhost.localdomain localhost6
localhost6.localdomain6
::0            mgs-new-test
[root at mgs-new-test ~]#

And here is the latest /var/log/messages

Aug 20 11:32:32 mgs-new-test kernel: EXT4-fs (vda1): mounted filesystem
with ordered data mode. Opts:
Aug 20 11:32:32 mgs-new-test kernel: Adding 417784k swap on
/dev/mapper/vg_mgsnewtest-lv_swap.  Priority:-1 extents:1 across:417784k
Aug 20 11:32:32 mgs-new-test kernel: NET: Registered protocol family 10
Aug 20 11:32:32 mgs-new-test kernel: lo: Disabled Privacy Extensions
Aug 20 11:33:25 mgs-new-test kernel: LNet: HW CPU cores: 1, npartitions: 1
Aug 20 11:33:25 mgs-new-test kernel: alg: No test for adler32 (adler32-zlib)
Aug 20 11:33:25 mgs-new-test kernel: alg: No test for crc32 (crc32-table)
Aug 20 11:33:29 mgs-new-test modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-431.20.3.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Aug 20 11:33:29 mgs-new-test kernel: padlock: VIA PadLock Hash Engine not
detected.
Aug 20 11:33:33 mgs-new-test kernel: Lustre: Lustre: Build Version:
2.6.0-RC2--PRISTINE-2.6.32-431.20.3.el6_lustre.x86_64
Aug 20 11:33:33 mgs-new-test kernel: LNet: Added LNI 192.168.122.50 at tcp
[8/256/0/180]
Aug 20 11:33:33 mgs-new-test kernel: LNet: Accept secure, port 988
Aug 20 11:34:41 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:53 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:54 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:54 mgs-new-test kernel: Lustre: ctl-mylustre-MDT0000: No data
found on store. Initialize space
Aug 20 11:34:54 mgs-new-test kernel: Lustre: mylustre-MDT0000: new disk,
initializing
Aug 20 11:34:54 mgs-new-test kernel: LustreError: 11-0:
*mylustre-MDT0000-lwp-MDT0000:
Communicating with 0 at lo, operation mds_connect failed with -11.*


Any pointers where else do I need to make the changes ?

Thanks in advance.


Warm Regards,
Abhay Dandekar



> On Wed, Aug 20, 2014 at 3:23 AM, Andreas Dilger <adilger at dilger.ca> wrote:
>
>> Often this problem is because the hostname in /etc/hosts is actually
>> mapped to localhost on the node itself.
>>
>> Unfortunately, this is how some systems are set up by default.
>>
>> Cheers, Andreas
>>
>> On Aug 19, 2014, at 12:39, "Abhay Dandekar" <dandekar.abhay at gmail.com>
>> wrote:
>>
>> I came across a similar situation.
>>
>> Below is the log of machine state. These steps worked on some setups
>> while on some it didnt.
>>
>> Armaan,
>>
>> Were you able to get over the problem ? Any workaround ?
>>
>> Thanks in advance for all your help.
>>
>>
>> Warm Regards,
>> Abhay Dandekar
>>
>>
>> ---------- Forwarded message ----------
>> From: Abhay Dandekar <dandekar.abhay at gmail.com>
>> Date: Wed, Aug 6, 2014 at 12:18 AM
>> Subject: Lustre configuration failure : lwp-MDT0000: Communicating with
>> 0 at lo, operation mds_connect failed with -11.
>> To: lustre-discuss at lists.lustre.org
>>
>>
>>
>> Hi All,
>>
>> I have come across an lustre installation failure where the MGS is always
>> trying to reach "lo" config instead of configured ethernet.
>>
>> These same steps worked on a different machine, somehow they are failing
>> here.
>>
>> Here are the logs
>>
>> Lustre installation is success with all the packages installed without
>> any error.
>>
>> 0. Lustre version
>>
>> Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
>> Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
>> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
>> No such device
>> Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
>> Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
>> Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
>> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
>> No such device
>> Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
>> detected.
>> Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
>> 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
>> Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50 at tcp
>> [8/256/0/180]
>> Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
>>
>>
>> 1. Mkfs
>>
>> [root at lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
>> /dev/sdb
>>
>>    Permanent disk data:
>> Target:     lustre:MDT0000
>> Index:      0
>> Lustre FS:  lustre
>> Mount type: ldiskfs
>> Flags:      0x65
>>               (MDT MGS first_time update )
>> Persistent mount opts: user_xattr,errors=remount-ro
>> Parameters:
>>
>> checking for existing Lustre data: not found
>> device size = 10240MB
>> formatting backing filesystem ldiskfs on /dev/sdb
>>     target name  lustre:MDT0000
>>     4k blocks     2621440
>>     options        -J size=400 -I 512 -i 2048 -q -O
>> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
>> lazy_journal_init -F
>> mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -J size=400 -I 512 -i
>> 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
>> lazy_journal_init -F /dev/sdb 2621440
>> Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
>> with ordered data mode. quota=on. Opts:
>> Writing CONFIGS/mountdata
>> [root at lfs-server ~]#
>>
>> 2. Mount
>>
>> [root at lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs
>> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
>> with ordered data mode. quota=on. Opts:
>> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
>> with ordered data mode. quota=on. Opts:
>> Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT0000: No data
>> found on store. Initialize space
>> Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT0000: new disk,
>> initializing
>> Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname
>> received: params
>> Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0:
>> lustre-MDT0000-lwp-MDT0000: Communicating with 0 at lo, operation
>> mds_connect failed with -11.
>> [root at lfs-server ~]#
>>
>>
>> 3. Unmount
>> [root at lfs-server ~]# umount /dev/sdb
>> Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT0000
>> Aug  5 17:19:52 lfs-server kernel: Lustre:
>> 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has
>> timed out for slow reply: [sent 1407239386/real 1407239386]
>> req at ffff88003d795c00 x1475596948340888/t0(0) o251->MGC192.168.122.50 at tcp
>> @0 at lo:26/25 lens 224/224 e 0 to 1 dl 1407239392 ref 2 fl
>> Rpc:XN/0/ffffffff rc 0/-1
>> [root at lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server
>> umount lustre-MDT0000 complete
>>
>> [root at lfs-server ~]#
>>
>>
>> 4. [root at mgs ~]# cat /etc/modprobe.d/lustre.conf
>> options lnet networks=tcp(eth0)
>> [root at mgs ~]#
>>
>> 5.Even the lnet configuration is in place, it does not pick up the
>> required eth0.
>>
>> [root at mgs ~]# lctl dl
>>   0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 8
>>   1 UP mgs MGS MGS 5
>>   2 UP mgc MGC192.168.122.50 at tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
>>   3 UP mds MDS MDS_uuid 3
>>   4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
>>   5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 5
>>   6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
>>   7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
>>   8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
>> [root at mgs ~]#
>>
>> Any pointers to go ahead ??
>>
>>
>> Warm Regards,
>> Abhay Dandekar
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140820/2014cd9b/attachment.htm>


More information about the lustre-discuss mailing list