[Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

Abhay Dandekar dandekar.abhay at gmail.com
Mon Sep 1 01:19:46 PDT 2014


Thanks for replying back Arman.


/var/log/messages still cribbs about the error as below :
Aug 29 15:01:59 MGS-1 kernel: LustreError: 11-0:
lustre-MDT0000-lwp-MDT0000: Communicating with 0 at lo, operation mds_connect
failed with -11.

but, adding a mapping in /etc/hosts allows others to connect to MGS now.

Seems like a workaround, but things are working as of now. It still fails
if you try to configure mdt with an IP.

Thanks again.



Warm Regards,
Abhay Dandekar


On Mon, Aug 25, 2014 at 5:00 PM, Arman Khalatyan <arm2arm at gmail.com> wrote:

> Hi Abhay,
> Could you please check the lnet status?
> lctl list_nids, or pings..
> Is you firewall enabled?
> BTW, i move all my servers to 2.5.x branch, that was fixing most of my
> troubles...
> a.
>
>
> On Tue, Aug 19, 2014 at 12:38 PM, Abhay Dandekar
> <dandekar.abhay at gmail.com> wrote:
> > I came across a similar situation.
> >
> > Below is the log of machine state. These steps worked on some setups
> while
> > on some it didnt.
> >
> > Armaan,
> >
> > Were you able to get over the problem ? Any workaround ?
> >
> > Thanks in advance for all your help.
> >
> >
> > Warm Regards,
> > Abhay Dandekar
> >
> >
> > ---------- Forwarded message ----------
> > From: Abhay Dandekar <dandekar.abhay at gmail.com>
> > Date: Wed, Aug 6, 2014 at 12:18 AM
> > Subject: Lustre configuration failure : lwp-MDT0000: Communicating with
> > 0 at lo, operation mds_connect failed with -11.
> > To: lustre-discuss at lists.lustre.org
> >
> >
> >
> > Hi All,
> >
> > I have come across an lustre installation failure where the MGS is always
> > trying to reach "lo" config instead of configured ethernet.
> >
> > These same steps worked on a different machine, somehow they are failing
> > here.
> >
> > Here are the logs
> >
> > Lustre installation is success with all the packages installed without
> any
> > error.
> >
> > 0. Lustre version
> >
> > Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
> > Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
> >
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
> > No such device
> > Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
> > Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32
> (adler32-zlib)
> > Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
> >
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
> > No such device
> > Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
> > detected.
> > Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
> > 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
> > Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50 at tcp
> > [8/256/0/180]
> > Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
> >
> >
> > 1. Mkfs
> >
> > [root at lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
> > /dev/sdb
> >
> >    Permanent disk data:
> > Target:     lustre:MDT0000
> > Index:      0
> > Lustre FS:  lustre
> > Mount type: ldiskfs
> > Flags:      0x65
> >               (MDT MGS first_time update )
> > Persistent mount opts: user_xattr,errors=remount-ro
> > Parameters:
> >
> > checking for existing Lustre data: not found
> > device size = 10240MB
> > formatting backing filesystem ldiskfs on /dev/sdb
> >     target name  lustre:MDT0000
> >     4k blocks     2621440
> >     options        -J size=400 -I 512 -i 2048 -q -O
> > dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> > lazy_journal_init -F
> > mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -J size=400 -I 512 -i
> 2048
> > -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> > lazy_journal_init -F /dev/sdb 2621440
> > Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Writing CONFIGS/mountdata
> > [root at lfs-server ~]#
> >
> > 2. Mount
> >
> > [root at lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs
> > Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT0000: No data
> found
> > on store. Initialize space
> > Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT0000: new disk,
> > initializing
> > Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname
> received:
> > params
> > Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0:
> > lustre-MDT0000-lwp-MDT0000: Communicating with 0 at lo, operation
> mds_connect
> > failed with -11.
> > [root at lfs-server ~]#
> >
> >
> > 3. Unmount
> > [root at lfs-server ~]# umount /dev/sdb
> > Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT0000
> > Aug  5 17:19:52 lfs-server kernel: Lustre:
> > 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has
> > timed out for slow reply: [sent 1407239386/real 1407239386]
> > req at ffff88003d795c00 x1475596948340888/t0(0)
> > o251->MGC192.168.122.50 at tcp@0 at lo:26/25 lens 224/224 e 0 to 1 dl
> 1407239392
> > ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
> > [root at lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server
> > umount lustre-MDT0000 complete
> >
> > [root at lfs-server ~]#
> >
> >
> > 4. [root at mgs ~]# cat /etc/modprobe.d/lustre.conf
> > options lnet networks=tcp(eth0)
> > [root at mgs ~]#
> >
> > 5.Even the lnet configuration is in place, it does not pick up the
> required
> > eth0.
> >
> > [root at mgs ~]# lctl dl
> >   0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 8
> >   1 UP mgs MGS MGS 5
> >   2 UP mgc MGC192.168.122.50 at tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
> >   3 UP mds MDS MDS_uuid 3
> >   4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
> >   5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 5
> >   6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
> >   7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
> >   8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
> > [root at mgs ~]#
> >
> > Any pointers to go ahead ??
> >
> >
> > Warm Regards,
> > Abhay Dandekar
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140901/3a7c3db0/attachment.htm>


More information about the lustre-discuss mailing list