[Lustre-devel] Fwd: Lustre configuration failure : lwp-MDT0000: Communicating with 0 at lo, operation mds_connect failed with -11.

Abhay Dandekar dandekar.abhay at gmail.com
Fri Aug 8 22:17:23 PDT 2014


FWDing ahead to lustre-devel. Requesting some pointers to go ahead.


Warm Regards,
Abhay Dandekar


---------- Forwarded message ----------
From: Abhay Dandekar <dandekar.abhay at gmail.com>
Date: Wed, Aug 6, 2014 at 12:18 AM
Subject: Lustre configuration failure : lwp-MDT0000: Communicating with 0 at lo,
operation mds_connect failed with -11.
To: lustre-discuss at lists.lustre.org



Hi All,

I have come across an lustre installation failure where the MGS is always
trying to reach "lo" config instead of configured ethernet.

These same steps worked on a different machine, somehow they are failing
here.

Here are the logs

Lustre installation is success with all the packages installed without any
error.

0. Lustre version

Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
(/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
No such device
Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
detected.
Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50 at tcp
[8/256/0/180]
Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988


1. Mkfs

[root at lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
/dev/sdb

   Permanent disk data:
Target:     lustre:MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

checking for existing Lustre data: not found
device size = 10240MB
formatting backing filesystem ldiskfs on /dev/sdb
    target name  lustre:MDT0000
    4k blocks     2621440
    options        -J size=400 -I 512 -i 2048 -q -O
dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -J size=400 -I 512 -i 2048
-q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
lazy_journal_init -F /dev/sdb 2621440
Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Writing CONFIGS/mountdata
[root at lfs-server ~]#

2. Mount

[root at lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs
Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT0000: No data
found on store. Initialize space
Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT0000: new disk,
initializing
Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname
received: params
Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0:
lustre-MDT0000-lwp-MDT0000: Communicating with 0 at lo, operation mds_connect
failed with -11.
[root at lfs-server ~]#


3. Unmount
[root at lfs-server ~]# umount /dev/sdb
Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT0000
Aug  5 17:19:52 lfs-server kernel: Lustre:
1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for slow reply: [sent 1407239386/real 1407239386]
req at ffff88003d795c00 x1475596948340888/t0(0) o251->MGC192.168.122.50 at tcp
@0 at lo:26/25 lens 224/224 e 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ffffffff
rc 0/-1
[root at lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server
umount lustre-MDT0000 complete

[root at lfs-server ~]#


4. [root at mgs ~]# cat /etc/modprobe.d/lustre.conf
options lnet networks=tcp(eth0)
[root at mgs ~]#

5.Even the lnet configuration is in place, it does not pick up the required
eth0.

[root at mgs ~]# lctl dl
  0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 8
  1 UP mgs MGS MGS 5
  2 UP mgc MGC192.168.122.50 at tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
  5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 5
  6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
  7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
  8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
[root at mgs ~]#

Any pointers to go ahead ??


Warm Regards,
Abhay Dandekar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20140809/a677604e/attachment.htm>


More information about the lustre-devel mailing list