[lustre-discuss] Lustre Client hanging on mount

Jeff Slapp Jeff.Slapp at DataCore.com
Thu Jan 12 06:32:47 PST 2017


​Throwing the issue back out to the team since it may have gotten lost in the end-of-year festivities.

I have seen references to this issue in the past, but nothing definitive as to the cause (or a potential fix).


________________________________
From: Jeff Slapp
Sent: Friday, December 30, 2016 4:45 PM
To: lustre-discuss at lists.lustre.org
Subject: Lustre Client hanging on mount

Any thoughts on why an error -11 is being generated? See below for logs and configuration details.

MGS/MDT on one machine (172.20.24.227)
OST on one machine (172.20.24.228)
Lustre client on one machine (172.20.24.217)

MGS/MDT ‘lctl dl’ Output:
0 UP osd-zfs mgsZFS-MDT0000-osd mgsZFS-MDT0000-osd_UUID 8
  1 UP mgs MGS MGS 7
  2 UP mgc MGC172.20.24.227 at tcp<mailto:MGC172.20.24.227 at tcp> e681e692-c8aa-5c2b-b88b-3dacbcb2d20a 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod mgsZFS-MDT0000-mdtlov mgsZFS-MDT0000-mdtlov_UUID 4
  5 UP mdt mgsZFS-MDT0000 mgsZFS-MDT0000_UUID 5
  6 UP mdd mgsZFS-MDD0000 mgsZFS-MDD0000_UUID 4
  7 UP qmt mgsZFS-QMT0000 mgsZFS-QMT0000_UUID 4
  8 UP lwp mgsZFS-MDT0000-lwp-MDT0000 mgsZFS-MDT0000-lwp-MDT0000_UUID 5

OST ‘lctl dl’ Output:
  0 UP osd-zfs ossZFS-OST0000-osd ossZFS-OST0000-osd_UUID 5
  1 UP mgc MGC172.20.24.227 at tcp<mailto:MGC172.20.24.227 at tcp> 441d1cfe-e390-8e3a-d88c-bd9d5d74d0e0 5
  2 UP ost OSS OSS_uuid 3
  3 UP obdfilter ossZFS-OST0000 ossZFS-OST0000_UUID 3

The client just hangs forever once the following comment is executed: mount -t lustre 172.20.24.227 at tcp0:/mgsZFS<mailto:172.20.24.227 at tcp0:/mgsZFS> /mnt/lustre. The connection list shows both the OST and the client connected.

MGS/MDT Connection list:
12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> I[0]mgs01.localdomain ->oss01.localdomain:1021 87040/369280 nonagle
12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> O[0]mgs01.localdomain ->oss01.localdomain:1022 87040/369280 nonagle
12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> C[0]mgs01.localdomain ->oss01.localdomain:1023 87040/369280 nonagle
12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> I[0]mgs01.Localdomain ->lc01.localdomain:1021 87040/369280 nonagle
12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> O[0]mgs01.localdomain ->lc01.localdomain:1022 87040/369280 nonagle
12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> C[0]mgs01.localdomain ->lc01.localdomain:1023 87040/369280 nonagle

MGS/MDT /var/log/messages entries upon mount from client:
Dec 29 09:44:51 mgs01 kernel: Lustre: MGS: Connection restored to 1557f55a-2cae-e908-6533-58d54eb4a274 (at 172.20.24.217 at tcp<mailto:172.20.24.217 at tcp>)

Lustre Client /var/log/messages entries upon mount to MGS/MDT:
Dec 29 09:44:55 lc01 kernel: LustreError: 6996:0:(lmv_obd.c:1402:lmv_statfs()) can't stat MDS #0 (mgsZFS-MDT0000-mdc-ffff8800ede07800), error -11
Dec 29 09:44:55 lc01 kernel: Lustre: Unmounted mgsZFS-client
Dec 29 09:44:55 lc01 kernel: LustreError: 6996:0:(obd_mount.c:1449:lustre_fill_super()) Unable to mount  (-11)

Luster Client response to lctl ping 172.20.24.227:
12345-0 at lo
12345-172.20.24.227 at tcp

Below are the steps I followed to get the Lustre systems up and running:
Using CentOS 7.3.1611 with the following roles enabled:
               File and Storage Server
               Guest Agents (if in a VM)
               Large System Performance
               Network File System Client
               Performance Tools
               Compatibility Libraries
               Development Tools

hostname {YOUR SERVER NAME}
systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config
               SELINUX=disabled

yum -y install http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
yum clean all
vi /etc/yum.repos.d/lustre_server.repo
               [lustre-server]
               name=CentOS-$releasever - Lustre server
               baseurl=https://downloads.hpdd.intel.com/public/lustre/lustre-2.9.0/el7.3.1611/server/
               gpgcheck=0
kernel_version=`yum list --showduplicates kernel | grep lustre-server | awk '{print $2}'`
kernel_firmware_version=`yum list --showduplicates kernel-firmware | grep lustre-server | awk '{print $2}'`
yum -y install --nogpgcheck --setopt=protected_multilib=false kernel-${kernel_version} kernel-firmware-${kernel_firmware_version} kernel-devel-${kernel_version} kernel-headers-${kernel_version}
yum clean all
yum -y install yum-plugin-versionlock
yum versionlock add kernel
yum versionlock add kernel-firmware
yum versionlock add kernel-devel
yum versionlock add kernel-headers
yum clean all
yum-config-manager --disable lustre-server
yum -y install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
yum clean all
yum-config-manager --disable zfs
yum-config-manager --enable zfs-kmod
reboot
yum -y install wget
yum -y install rpm-build
yum -y install kmod-zfs-devel libzfs2-devel
yum -y install libselinux-devel libtool
rm -f lustre-2.9.0-1.src.rpm&& wget -q https://downloads.hpdd.intel.com/public/lustre/lustre-2.9.0/el7/server/SRPMS/lustre-2.9.0-1.src.rpm
rm -rf ~rpmbuild&& rpmbuild --rebuild --with zfs lustre-2.9.0-1.src.rpm
cd ~/rpmbuild/RPMS/`uname -m`/&& yum -y install kmod-lustre-osd-zfs-2.9.0* kmod-lustre-2.9.0* lustre-osd-zfs-mount-2.9.0* lustre-2.9.0* lustre-iokit*
modprobe zfs
echo "options lnet networks=tcp0(eth0)" > /etc/modprobe.d/lustre.conf
zpool create -f mgs01-pool mirror /dev/sd[b-c]
mkfs.lustre --mdt --mgs --backfstype=zfs --fsname=mgsZFS --index=0 --mgsnode=[MGS IP ADDRESS]@tcp0 mgs01-pool/mgsZFS
mkdir /mnt/mgsZFS
mount -t lustre mgs01-pool/mgsZFS /mnt/mgsZFS




Jeff Slapp | Director, Systems Engineering and Solution Architecture

DataCore Software Corporation
Corporate Park
6300 NW 5th Way
Ft. Lauderdale, FL 33309
http://www.datacore.com<http://www.datacore.com/>
THE DATA INFRASTRUCTURE SOFTWARE COMPANY



More information about the lustre-discuss mailing list