[Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

Andreas Dilger adilger at dilger.ca
Tue Aug 19 14:53:06 PDT 2014


Often this problem is because the hostname in /etc/hosts is actually mapped to localhost on the node itself. 

Unfortunately, this is how some systems are set up by default. 

Cheers, Andreas

> On Aug 19, 2014, at 12:39, "Abhay Dandekar" <dandekar.abhay at gmail.com> wrote:
> 
> I came across a similar situation.
> 
> Below is the log of machine state. These steps worked on some setups while on some it didnt.
> 
> Armaan,
> 
> Were you able to get over the problem ? Any workaround ?
> 
> Thanks in advance for all your help.
> 
> 
> Warm Regards,
> Abhay Dandekar
> 
> 
> ---------- Forwarded message ----------
> From: Abhay Dandekar <dandekar.abhay at gmail.com>
> Date: Wed, Aug 6, 2014 at 12:18 AM
> Subject: Lustre configuration failure : lwp-MDT0000: Communicating with 0 at lo, operation mds_connect failed with -11.
> To: lustre-discuss at lists.lustre.org
> 
> 
> 
> Hi All,
> 
> I have come across an lustre installation failure where the MGS is always trying to reach "lo" config instead of configured ethernet.
> 
> These same steps worked on a different machine, somehow they are failing here.
> 
> Here are the logs 
> 
> Lustre installation is success with all the packages installed without any error.
> 
> 0. Lustre version 
> 
> Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
> Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko): No such device
> Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
> Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
> Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko): No such device
> Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not detected.
> Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version: 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
> Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50 at tcp [8/256/0/180]
> Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
> 
> 
> 1. Mkfs
> 
> [root at lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0 /dev/sdb 
> 
>    Permanent disk data:
> Target:     lustre:MDT0000
> Index:      0
> Lustre FS:  lustre
> Mount type: ldiskfs
> Flags:      0x65
>               (MDT MGS first_time update )
> Persistent mount opts: user_xattr,errors=remount-ro
> Parameters:
> 
> checking for existing Lustre data: not found
> device size = 10240MB
> formatting backing filesystem ldiskfs on /dev/sdb
>     target name  lustre:MDT0000
>     4k blocks     2621440
>     options        -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
> mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -J size=400 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb 2621440
> Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: 
> Writing CONFIGS/mountdata
> [root at lfs-server ~]#
> 
> 2. Mount
> 
> [root at lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts: 
> Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
> Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT0000: new disk, initializing
> Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname received: params
> Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0 at lo, operation mds_connect failed with -11.
> [root at lfs-server ~]# 
> 
> 
> 3. Unmount
> [root at lfs-server ~]# umount /dev/sdb 
> Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT0000
> Aug  5 17:19:52 lfs-server kernel: Lustre: 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1407239386/real 1407239386]  req at ffff88003d795c00 x1475596948340888/t0(0) o251->MGC192.168.122.50 at tcp@0 at lo:26/25 lens 224/224 e 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
> [root at lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server umount lustre-MDT0000 complete
> 
> [root at lfs-server ~]# 
> 
> 
> 4. [root at mgs ~]# cat /etc/modprobe.d/lustre.conf 
> options lnet networks=tcp(eth0)
> [root at mgs ~]# 
> 
> 5.Even the lnet configuration is in place, it does not pick up the required eth0.
> 
> [root at mgs ~]# lctl dl 
>   0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 8
>   1 UP mgs MGS MGS 5
>   2 UP mgc MGC192.168.122.50 at tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
>   3 UP mds MDS MDS_uuid 3
>   4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
>   5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 5
>   6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
>   7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
>   8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
> [root at mgs ~]# 
> 
> Any pointers to go ahead ??
> 
> 
> Warm Regards,
> Abhay Dandekar
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140819/b32fedc3/attachment.htm>


More information about the lustre-discuss mailing list