[Lustre-discuss] Help with problem mounting Lustre from another network

Yujun Wu yujun at phys.ufl.edu
Thu Jun 19 12:36:36 PDT 2008


Hello Brian,

Thanks for your info. Yes, after disabling selinux and adding
accept option for lnet (a tip from a local colleague), everything
works fine. Thanks again for your help.


Regards,
Yujun 
>On Thu, 19 Jun 2008
lustre-discuss-request at lists.lustre.org wrote:

> Send Lustre-discuss mailing list submissions to
> 	lustre-discuss at lists.lustre.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.lustre.org/mailman/listinfo/lustre-discuss
> or, via email, send a message with subject or body 'help' to
> 	lustre-discuss-request at lists.lustre.org
> 
> You can reach the person managing the list at
> 	lustre-discuss-owner at lists.lustre.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Lustre-discuss digest..."
> 
> 
> Today's Topics:
> 
>    1. Help with problem mounting Lustre from another	network (Yujun Wu)
>    2. Re: Help with problem mounting Lustre	from	another	network
>       (Brian J. Murrell)
>    3. Re: How do I recover files from partial	lustre	disk?
>       (Andreas Dilger)
>    4. lustre 1.4 with ibhost stack issue (Changer Van)
>    5. Re: Lustre and memory-mapped I/O (Nikita Danilov)
>    6. Re: Lustre and memory-mapped I/O (Andreas Dilger)
>    7. Re: Lustre 1.6.5 install problem (Charles Taylor)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 18 Jun 2008 20:04:01 -0400 (EDT)
> From: Yujun Wu <yujun at phys.ufl.edu>
> Subject: [Lustre-discuss] Help with problem mounting Lustre from
> 	another	network
> To: lustre-discuss at lists.lustre.org
> Message-ID:
> 	<Pine.GSO.4.21.0806181946550.18213-100000 at neptune.phys.ufl.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> 
> Hello,
> 
> Could somebody please give me some hint on this?
> 
> This is my first trying with Lustre. I installed everything on
> a single node following the Lustre quick start:
> 
> http://wiki.lustre.org/index.php?title=Lustre_Quick_Start
> 
> with the new version 1.6.5. When I mounted the client
> on the same node, everything works fine. 
> 
> Later, I tried to mount the client from a seperate network. I
> got the following error:
> 
> >mount -t lustre 128.227.89.181 at tcp:/testfs /mnt/testfs
> 
> mount.lustre: mount 128.227.89.181 at tcp:/testfs at /mnt/testfs
> failed: Cannot send after transport endpoint shutdown
> 
> The error message from the server side is:
> 
> Jun 18 19:52:04 olivine
> kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error
> -11 reading HELLO from 128.227.221.35
> Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc:  denied  {
> rawip_send } for  pid=5682 comm="socknal_cd02" saddr=128.227.89.181
> src=988 daddr=128.227.221.35 dest=1023 netif=eth0
> scontext=system_u:object_r:unlabeled_t
> tcontext=system_u:object_r:netif_eth0_t tclass=netif
> 
> This is the result using lctl:
> 
> >lctl ping 128.227.89.181 at tcp
> failed to ping 128.227.89.181 at tcp: Input/output error
> 
> This is the related configuration of /etc/modprobe.conf from both client
> node and server node (MDS+OSTs):
> 
> # Networking options, see /sys/module/lnet/parameters
> options lnet networks=tcp
> #    (the llite module has been renamed to lustre)
> # end Lustre modules
> 
> The server's ip address is: 128.227.89.181
> The client's ip address is: 128.227.221.35
> 
> They are on two different network.
> 
> Thanks in advance for any help you give.
> 
> 
> Regards,
> Yujun
> 
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 18 Jun 2008 23:12:43 -0400
> From: "Brian J. Murrell" <Brian.Murrell at Sun.COM>
> Subject: Re: [Lustre-discuss] Help with problem mounting Lustre	from
> 	another	network
> To: lustre-discuss at lists.lustre.org
> Message-ID: <1213845163.18266.115.camel at pc.ilinx>
> Content-Type: text/plain; charset="us-ascii"
> 
> On Wed, 2008-06-18 at 20:04 -0400, Yujun Wu wrote:
> > Hello,
> 
> Hi,
> 
> > The error message from the server side is:
> > 
> > Jun 18 19:52:04 olivine
> > kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error
> > -11 reading HELLO from 128.227.221.35
> > Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc:  denied  {
> > rawip_send } for  pid=5682 comm="socknal_cd02" saddr=128.227.89.181
> > src=988 daddr=128.227.221.35 dest=1023 netif=eth0
> > scontext=system_u:object_r:unlabeled_t
> > tcontext=system_u:object_r:netif_eth0_t tclass=netif
> 
> I am sooooo glad you included this kernel "audit" message.  You need to
> disable selinux or apparmor or whatever MAC/RBAC tools you are running
> on your Lustre machines.
> 
> b.
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: This is a digitally signed message part
> Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080618/dab98c82/attachment-0001.bin 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 18 Jun 2008 23:31:53 -0600
> From: Andreas Dilger <adilger at sun.com>
> Subject: Re: [Lustre-discuss] How do I recover files from partial
> 	lustre	disk?
> To: megan <dobsonunit at gmail.com>
> Cc: Lustre User Discussion Mailing List
> 	<lustre-discuss at lists.lustre.org>
> Message-ID: <20080619053153.GM3726 at webber.adilger.int>
> Content-Type: text/plain; charset=iso-8859-1
> 
> On Jun 18, 2008  14:33 -0700, megan wrote:
> > shell-prompt> mount -t lustre /dev/md1 /srv/lustre/mds/crew4-MDT0000
> > 
> > No errors so far.
> > 
> > shell-prompt> lctl
> >    dl                                (Found my nids of failed JBODs)
> >    device 14
> >    deactivate
> > 
> >    device 16
> >    deactivate
> > 
> >    quit
> > 
> > On one of our servers, I mounted the lustre disk /crew4.
> > The disk will hang a UNIX df or ls command.
> 
> You actually need to do the "deactivate" step on the client.  Then
> "ls" will get EIO on the file, and "df" will return data only from
> the available OSTs.
> 
> > However....
> > lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-
> > OST0004_UUID -print /crew4
> > 
> > Did indeed provide a list of files.   I saved the list to a text
> > file.   I will next see if I am able to copy a single file to a new
> > location.
> > 
> > Thank you again Andreas for this incredibly useful information.   Do
> > you/Sun do paid Lustre consulting by any chance?
> 
> Yes, in fact we do...
> 
> > On Jun 18, 12:48?am, Andreas Dilger <adil... at sun.com> wrote:
> > > On Jun 16, 2008 ?15:37 -0700, megan wrote:
> > >
> > > > I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a
> > > > CentOS 5 linux x86_64 linux box.
> > > > We had a hardware problem that caused the underlying ext3 partition
> > > > table to completely blow up. ?This is resulting in only three of five
> > > > OSTs being mountable. ? The main lustre disk of this unit cannot be
> > > > mounted because the MDS knows that two of its parts are missing.
> > >
> > > It should be possible to mount a Lustre filesystem with OSTs that
> > > are not available. ?However, access to files on the unavailable
> > > OSTs will cause the process to wait on OST recovery.
> > >
> > >
> > >
> > > > The underlying set-up is JBOD hw that is passed to the linux OS, via
> > > > an LSI 8888ELP card in this case, as a simple device, ie. sde,
> > > > sdf,... ? ?The simple devices were partitioned using parted and
> > > > formatted ext3 then lustre was built on top of the five ext3 units.
> > > > There was no striping done across units/JBODS. ? Three of the five
> > > > units passed an e2fsck and an lfsck. ?Those remaining units are
> > > > mounted as such:
> > > > /dev/sdc ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4-
> > > > OST0003
> > > > /dev/sdd ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4-
> > > > OST0004
> > > > /dev/sdf ? ? ? ? ? ? ? 13T ?6.2T ?5.8T ?52% /srv/lustre/OST/crew4-
> > > > OST0001
> > >
> > > > Being that it is unlikely that we shall be able to recover the
> > > > underlying ext3 on the other two units, is there some method by which
> > > > I might try to rescue the data from these last three units mounted
> > > > currently on the OSS?
> > >
> > > > Any and all suggestion genuinely appreciated.
> > >
> > > The recoverability of your data depends heavily on the striping of
> > > the individual files (i.e. the default striping). ?If your files have
> > > a default stripe_count = 1, then you can probably recover 3/5 of the
> > > files in the filesystem. ?If your default stripe_count = 2, then you
> > > can probably only recover 1/5 of the files, and if you have a higher
> > > stripe_count you probably can't recover any files.
> > >
> > > What you need to do is to mount one of the clients and mark the
> > > corresponding OSTs inactive with:
> > >
> > > ? ? ? ? lctl dl ? ?# get device numbers for OSC 0000 and OSC 0002
> > > ? ? ? ? lctl --device N deactivate
> > >
> > > Then, instead of the clients waiting for the OSTs to recover the
> > > client will get an IO error when it accesses files on the failed OSTs.
> > >
> > > To get a list of the files that are on the good OSTs run:
> > >
> > > ? ? ? ? lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID
> > > ? ? ? ? ? ? ? ? ?--ost crew4-OST0004_UUID {mountpoint}
> > >
> > > Cheers, Andreas
> > > --
> > > Andreas Dilger
> > > Sr. Staff Engineer, Lustre Group
> > > Sun Microsystems of Canada, Inc.
> > >
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Thu, 19 Jun 2008 15:26:49 +0800
> From: "Changer Van" <changerv at gmail.com>
> Subject: [Lustre-discuss] lustre 1.4 with ibhost stack issue
> To: lustre-discuss at clusterfs.com
> Message-ID:
> 	<9fa3c2e50806190026p6351959eu31e6b9011cb50f5b at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi all,
> 
> I installed lustre 1.4 with voltaire ibhost stack on RHEL4.
> The kernel version is 2.6.9-55.0.9.el_lustre.1.4.11.1custom.
> 
> The system was stopped by stopping voltaireibhost process
> when it is going down.
> 
> Stopping voltaireibhost... [ press enter twice ]
> 
> I had to press enter twice to bring it down
> and got the following messages on screen:
> 
> IPOIB_UD: The del command
> IPOIB_UD: Thread going out ...
> IPOIB_UD: leave del command
> rmmod ...
> ...
> IPOIB_UD: unregister units
> IPOIB_UD: destroys pool
> 
> I also had to press enter twice to bring the system up.
> Then I turned off the init.d service of the voltaireibhost
> and started it manually after system reboot. It was fine.
> 
> What is wrong with this lustre machine?
> 
> Any suggestion would be greatly appreciated.
> 
> -- 
> Regards,
> Changer
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080619/528a8874/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 5
> Date: Thu, 19 Jun 2008 12:44:56 +0400
> From: Nikita Danilov <Nikita.Danilov at Sun.COM>
> Subject: Re: [Lustre-discuss] Lustre and memory-mapped I/O
> To: "Huang, Eric" <eric.huang at intel.com>
> Cc: lustre-discuss at clusterfs.com
> Message-ID: <18522.7304.410714.44837 at gargle.gargle.HOWL>
> Content-Type: text/plain; charset=us-ascii
> 
> Huang, Eric writes:
> 
> Hello,
> 
>  > 
>  > Does Lustre support memory mapped I/O and direct I/O? 
> 
> yes, it supports both. Can you run your application under strace to see
> how exactly it fails to create a directory?
> 
>  > 
>  > I am trying to run Nastran using Lustre but it always reported failure
>  > to create a directory. Since Nastran does a lot of memory mapped I/O, I
>  > was wondering if it was the cause. 
>  > 
>  > I guess a good question to ask is that does Lustre support all POSIX
>  > file system operations?
>  > 
>  > Thanks a lot.
>  > 
>  > Eric
> 
> Nikita.
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Thu, 19 Jun 2008 02:56:44 -0600
> From: Andreas Dilger <adilger at sun.com>
> Subject: Re: [Lustre-discuss] Lustre and memory-mapped I/O
> To: "Huang, Eric" <eric.huang at intel.com>
> Cc: lustre-discuss at clusterfs.com
> Message-ID: <20080619085644.GR3726 at webber.adilger.int>
> Content-Type: text/plain; charset=us-ascii
> 
> On Jun 17, 2008  22:39 -0700, Huang, Eric wrote:
> > Does Lustre support memory mapped I/O and direct I/O? 
> 
> Yes, it does support both of these.
> 
> > I am trying to run Nastran using Lustre but it always reported failure
> > to create a directory. Since Nastran does a lot of memory mapped I/O, I
> > was wondering if it was the cause. 
> 
> ???  I'm not sure how mmap and direct I/O relate to creating a directory?
> 
> > I guess a good question to ask is that does Lustre support all POSIX
> > file system operations?
> 
> Yes.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> 
> 
> ------------------------------
> 
> Message: 7
> Date: Thu, 19 Jun 2008 06:19:46 -0400
> From: Charles Taylor <taylor at hpc.ufl.edu>
> Subject: Re: [Lustre-discuss] Lustre 1.6.5 install problem
> To: Johnlya <johnlya at gmail.com>
> Cc: lustre-discuss at clusterfs.com
> Message-ID: <CE9459CB-6FEA-4AD5-A4B4-319B4F09D9B9 at hpc.ufl.edu>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> Lustre doesn't know where your ib modules symbols are.   When you  
> configured lustre (in the build sense) you pointed it to a patched  
> kernel tree.   In that directory is a Module.symvers file devoid of ib  
> module symbols.  You should also have a Module.symvers in your /usr/ 
> src/ofa_kernel directory (assuming you built OFED as well).   So...
> 
> cat /usr/src/ofa_kernel/Module.symvers >> <patched_kernel_dir>/ 
> Module.symvers
> 
> and run "make install" again and it should be happy.   For a 2.6.9  
> kernel, you probably need OFED 1.2.
> 
> Charlie Taylor
> UF HPC Center
> 
> On Jun 18, 2008, at 5:55 AM, Johnlya wrote:
> 
> > Install step is:
> > rpm -Uvh --nodeps e2fsprogs-devel-1.40.7.sun3-0redhat.x86_64.rpm
> > rpm -Uvh e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
> > cd ../PyXML/
> > tar -zxvf  PyXML-0.8.4.tar.gz
> > cd PyXML-0.8.4
> > python setup.py build
> > python setup.py install
> > cd ../../Expect
> > rpm -ivh expect-5.42.1-1.src.rpm
> > cd ../1.6.5/
> > rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.x86_64.rpm
> > rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.
> > 1.6.5smp.x86_64.rpm
> > rpm -ivh lustre-modules-1.6.5-2.6.9_67.0.7.EL_lustre.
> > 1.6.5smp.x86_64.rpm
> >
> > when install lustre-modules, it displays warning:
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_create_cq
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_addr
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_dereg_mr
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_reject
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_disconnect
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_route
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_bind_addr
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_create_qp
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_destroy_cq
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_create_id
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_listen
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_qp
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_get_dma_mr
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_alloc_pd
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_connect
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_modify_qp
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_id
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol rdma_accept
> > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/
> > lustre/ko2iblnd.ko needs unknown symbol ib_dealloc_pd
> >
> > Please tell me why?
> > Thank you
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> End of Lustre-discuss Digest, Vol 29, Issue 34
> **********************************************
> 




More information about the lustre-discuss mailing list