[lustre-discuss] Lustre Client hanging on mount

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Fri Jan 13 06:51:00 PST 2017


Glad you were able to get it up and running.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

> On Jan 12, 2017, at 9:52 PM, Jeff Slapp <Jeff.Slapp at DataCore.com> wrote:
> 
> That was the solution! Thank you for your support on this. 
> 
> -----Original Message-----
> From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu] 
> Sent: Thursday, January 12, 2017 10:51 AM
> To: Jeff Slapp <Jeff.Slapp at DataCore.com>
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre Client hanging on mount
> 
> I noticed that you appear to have formatted the MDT with the file system name “mgsZFS” while the OST was formatted with the file system name “ossZFS”.  The same name needs to be used on all MDTs/OSTs in the same file system.  Until that is fixed, your file system won’t work properly.
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 
>> On Jan 12, 2017, at 9:32 AM, Jeff Slapp <Jeff.Slapp at DataCore.com> wrote:
>> 
>> ​Throwing the issue back out to the team since it may have gotten lost in the end-of-year festivities.
>> 
>> I have seen references to this issue in the past, but nothing definitive as to the cause (or a potential fix).
>> 
>> 
>> ________________________________
>> From: Jeff Slapp
>> Sent: Friday, December 30, 2016 4:45 PM
>> To: lustre-discuss at lists.lustre.org
>> Subject: Lustre Client hanging on mount
>> 
>> Any thoughts on why an error -11 is being generated? See below for logs and configuration details.
>> 
>> MGS/MDT on one machine (172.20.24.227)
>> OST on one machine (172.20.24.228)
>> Lustre client on one machine (172.20.24.217)
>> 
>> MGS/MDT ‘lctl dl’ Output:
>> 0 UP osd-zfs mgsZFS-MDT0000-osd mgsZFS-MDT0000-osd_UUID 8
>> 1 UP mgs MGS MGS 7
>> 2 UP mgc MGC172.20.24.227 at tcp<mailto:MGC172.20.24.227 at tcp> e681e692-c8aa-5c2b-b88b-3dacbcb2d20a 5
>> 3 UP mds MDS MDS_uuid 3
>> 4 UP lod mgsZFS-MDT0000-mdtlov mgsZFS-MDT0000-mdtlov_UUID 4
>> 5 UP mdt mgsZFS-MDT0000 mgsZFS-MDT0000_UUID 5
>> 6 UP mdd mgsZFS-MDD0000 mgsZFS-MDD0000_UUID 4
>> 7 UP qmt mgsZFS-QMT0000 mgsZFS-QMT0000_UUID 4
>> 8 UP lwp mgsZFS-MDT0000-lwp-MDT0000 mgsZFS-MDT0000-lwp-MDT0000_UUID 5
>> 
>> OST ‘lctl dl’ Output:
>> 0 UP osd-zfs ossZFS-OST0000-osd ossZFS-OST0000-osd_UUID 5
>> 1 UP mgc MGC172.20.24.227 at tcp<mailto:MGC172.20.24.227 at tcp> 441d1cfe-e390-8e3a-d88c-bd9d5d74d0e0 5
>> 2 UP ost OSS OSS_uuid 3
>> 3 UP obdfilter ossZFS-OST0000 ossZFS-OST0000_UUID 3
>> 
>> The client just hangs forever once the following comment is executed: mount -t lustre 172.20.24.227 at tcp0:/mgsZFS<mailto:172.20.24.227 at tcp0:/mgsZFS> /mnt/lustre. The connection list shows both the OST and the client connected.
>> 
>> MGS/MDT Connection list:
>> 12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> I[0]mgs01.localdomain ->oss01.localdomain:1021 87040/369280 nonagle
>> 12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> O[0]mgs01.localdomain ->oss01.localdomain:1022 87040/369280 nonagle
>> 12345-172.20.24.228 at tcp<mailto:12345-172.20.24.228 at tcp> C[0]mgs01.localdomain ->oss01.localdomain:1023 87040/369280 nonagle
>> 12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> I[0]mgs01.Localdomain ->lc01.localdomain:1021 87040/369280 nonagle
>> 12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> O[0]mgs01.localdomain ->lc01.localdomain:1022 87040/369280 nonagle
>> 12345-172.20.24.217 at tcp<mailto:12345-172.20.24.217 at tcp> C[0]mgs01.localdomain ->lc01.localdomain:1023 87040/369280 nonagle
>> 
>> MGS/MDT /var/log/messages entries upon mount from client:
>> Dec 29 09:44:51 mgs01 kernel: Lustre: MGS: Connection restored to 1557f55a-2cae-e908-6533-58d54eb4a274 (at 172.20.24.217 at tcp<mailto:172.20.24.217 at tcp>)
>> 
>> Lustre Client /var/log/messages entries upon mount to MGS/MDT:
>> Dec 29 09:44:55 lc01 kernel: LustreError: 6996:0:(lmv_obd.c:1402:lmv_statfs()) can't stat MDS #0 (mgsZFS-MDT0000-mdc-ffff8800ede07800), error -11
>> Dec 29 09:44:55 lc01 kernel: Lustre: Unmounted mgsZFS-client
>> Dec 29 09:44:55 lc01 kernel: LustreError: 6996:0:(obd_mount.c:1449:lustre_fill_super()) Unable to mount  (-11)
>> 
>> Luster Client response to lctl ping 172.20.24.227:
>> 12345-0 at lo
>> 12345-172.20.24.227 at tcp
>> 
>> Below are the steps I followed to get the Lustre systems up and running:
>> Using CentOS 7.3.1611 with the following roles enabled:
>>              File and Storage Server
>>              Guest Agents (if in a VM)
>>              Large System Performance
>>              Network File System Client
>>              Performance Tools
>>              Compatibility Libraries
>>              Development Tools
>> 
>> hostname {YOUR SERVER NAME}
>> systemctl stop firewalld
>> systemctl disable firewalld
>> vi /etc/selinux/config
>>              SELINUX=disabled
>> 
>> yum -y install http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
>> yum clean all
>> vi /etc/yum.repos.d/lustre_server.repo
>>              [lustre-server]
>>              name=CentOS-$releasever - Lustre server
>>              baseurl=https://downloads.hpdd.intel.com/public/lustre/lustre-2.9.0/el7.3.1611/server/
>>              gpgcheck=0
>> kernel_version=`yum list --showduplicates kernel | grep lustre-server | awk '{print $2}'`
>> kernel_firmware_version=`yum list --showduplicates kernel-firmware | grep lustre-server | awk '{print $2}'`
>> yum -y install --nogpgcheck --setopt=protected_multilib=false kernel-${kernel_version} kernel-firmware-${kernel_firmware_version} kernel-devel-${kernel_version} kernel-headers-${kernel_version}
>> yum clean all
>> yum -y install yum-plugin-versionlock
>> yum versionlock add kernel
>> yum versionlock add kernel-firmware
>> yum versionlock add kernel-devel
>> yum versionlock add kernel-headers
>> yum clean all
>> yum-config-manager --disable lustre-server
>> yum -y install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
>> yum clean all
>> yum-config-manager --disable zfs
>> yum-config-manager --enable zfs-kmod
>> reboot
>> yum -y install wget
>> yum -y install rpm-build
>> yum -y install kmod-zfs-devel libzfs2-devel
>> yum -y install libselinux-devel libtool
>> rm -f lustre-2.9.0-1.src.rpm&& wget -q https://downloads.hpdd.intel.com/public/lustre/lustre-2.9.0/el7/server/SRPMS/lustre-2.9.0-1.src.rpm
>> rm -rf ~rpmbuild&& rpmbuild --rebuild --with zfs lustre-2.9.0-1.src.rpm
>> cd ~/rpmbuild/RPMS/`uname -m`/&& yum -y install kmod-lustre-osd-zfs-2.9.0* kmod-lustre-2.9.0* lustre-osd-zfs-mount-2.9.0* lustre-2.9.0* lustre-iokit*
>> modprobe zfs
>> echo "options lnet networks=tcp0(eth0)" > /etc/modprobe.d/lustre.conf
>> zpool create -f mgs01-pool mirror /dev/sd[b-c]
>> mkfs.lustre --mdt --mgs --backfstype=zfs --fsname=mgsZFS --index=0 --mgsnode=[MGS IP ADDRESS]@tcp0 mgs01-pool/mgsZFS
>> mkdir /mnt/mgsZFS
>> mount -t lustre mgs01-pool/mgsZFS /mnt/mgsZFS
>> 
>> 
>> 
>> 
>> Jeff Slapp | Director, Systems Engineering and Solution Architecture
>> 
>> DataCore Software Corporation
>> Corporate Park
>> 6300 NW 5th Way
>> Ft. Lauderdale, FL 33309
>> http://www.datacore.com<http://www.datacore.com/>
>> THE DATA INFRASTRUCTURE SOFTWARE COMPANY
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> 





More information about the lustre-discuss mailing list