[Lustre-discuss] Client hangs on 'simple' lustre setup

Jon Yeargers yeargers at ohsu.edu
Mon Sep 17 08:33:36 PDT 2012


What's the preferred method for posting large files to this list?

Does Lustre have specific log files that I should look for or just anything in /var/log/messages that seems appropriate?

-----Original Message-----
From: Colin Faber [mailto:cfaber at gmail.com] 
Sent: Monday, September 17, 2012 8:05 AM
To: Jon Yeargers
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Client hangs on 'simple' lustre setup

Hi Jon,

If you could provide the logging from your clients and mds (as well as possibly oss's) that'll help in determining the problem.

-cf

On 09/17/2012 08:57 AM, Jon Yeargers wrote:
>
> Issue: I'm trying to assess the (possible) use of Lustre for our 
> group. To this end I've been trying to create a simple system to 
> explore the nuances. I can't seem to get past the 'llmount.sh' test 
> with any degree of success.
>
> What I've done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is 
> formatted with CentOS 6.2. I then update everything and install the 
> Lustre kernel from downloads.whamcloud.com and add on the various
> (appropriate) lustre and e2fs RPM files. Systems are rebooted and 
> tested with 'llmount.sh' (and then cleared with 'llmountcleanup.sh').
> All is well to this point.
>
> First I create an MDS/MDT system via:
>
> /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre
> --device-size=2000000 --param sys.timeout=20 
> --mountfsoptions=errors=remount-ro,user_xattr,acl --param
> lov.stripesize=1048576 --param lov.stripecount=0 --param 
> mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs 
> --reformat /tmp/lustre-mdt1
>
> and then
>
> mkdir -p /mnt/mds1
>
> mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1
>
> Next I take 3 systems and create a 2Gb loop mount via:
>
> /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 
> --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs 
> --reformat /tmp/lustre-ost1
>
> mkdir -p /mnt/ost1
>
> mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1
>
> The logs on the MDT box show the OSS boxes connecting up. All appears ok.
>
> Last I create a client and attach to the MDT box:
>
> mkdir -p /mnt/lustre
>
> mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre 
> /mnt/lustre
>
> Again, the log on the MDT box shows the client connection. Appears to 
> be successful.
>
> Here's where the issues (appear to) start. If I do a 'df -h' on the 
> client it hangs after showing the system drives. If I attempt to 
> create files (via 'dd') on the lustre mount the session hangs and the 
> job can't be killed. Rebooting the client is the only solution.
>
> I can create and use a client on the MDS/MSG box. Doing so from any 
> other machine will hang.
>
> From the MDS box:
>
> [root at lustre_mds0 lustre]# lctl dl
>
> 0 UP mgs MGS MGS 13
>
> 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5
>
> 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
>
> 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7
>
> 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3
>
> 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
>
> 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
>
> 7 UP lov lustre-clilov-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4
>
> 8 UP lmv lustre-clilmv-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4
>
> 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5
>
> 10 UP osc lustre-OST0000-osc-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5
>
> 11 UP osc lustre-OST0001-osc-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5
>
> 12 UP osc lustre-OST0002-osc-ffff8800631c8000
> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5
>
> 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
>
> [root at lustre_mds0 lustre]# lfs df -h
>
> UUID bytes Used Available Use% Mounted on
>
> lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0]
>
> lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0]
>
> lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1]
>
> lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2]
>
> filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre
>
> All appears normal.
>
> Doing this from another (identical) client:
>
> [root at lfstest0 lustre]# lctl dl
>
> 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5
>
> 1 UP lov lustre-clilov-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4
>
> 2 UP lmv lustre-clilmv-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4
>
> 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5
>
> 4 UP osc lustre-OST0000-osc-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5
>
> 5 UP osc lustre-OST0001-osc-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5
>
> 6 UP osc lustre-OST0002-osc-ffff880070eee400
> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5
>
> [root at lfstest0 lustre]# lfs df
>
> UUID 1K-blocks Used Available Use% Mounted on
>
> lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0]
>
> OST0000 : inactive device
>
> lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1]
>
> OST0002 : inactive device
>
> filesystem summary: 1968528 1122468 745996 60% /mnt/luster
>
> Doing a ‘dd’ or ‘touch’ or even ‘df’ from this machine will hang it.
>
> EDIT: each system has all other systems defined in /etc/hosts and 
> entries in iptables to provide access.
>
> All systems have identical setup:
>
> [root at lfstest0 lustre]# rpm -qa | grep lustre
>
> lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64
>
> lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64
>
> kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64
>
> lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64
>
> lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64
>
> [root at lfstest0 lustre]# uname -a
>
> Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon 
> Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> [root at lfstest0 lustre]# rpm -qa | grep e2fs
>
> e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64
>
> e2fsprogs-1.41.90.wc2-7.el6.x86_64
>
> SO: I'm clearly making several mistakes. Any pointers as to where to 
> start correcting them?
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list