[Lustre-discuss] help needed.

Aaron Knister aaron at iges.org
Thu Jan 3 08:38:03 PST 2008


SELinux is killing your lustre setup. See this article on how to  
disable it http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/rhlcommon-section-0068.html#RHLCOMMON-SECTION-0094

a reboot will be required. That should the trick.

-Aaron

On Jan 3, 2008, at 4:22 AM, Avi Gershon wrote:

> Hi,
> dmesg:
> ***********************************************************************************************
> Lustre: 2092:0:(module.c:382:init_libcfs_module()) maximum lustre  
> stack 8192
> Lustre: OBD class driver, info at clusterfs.com
> Lustre Version: 1.6.3
> Build Version: 1.6.3-19691231190000- 
> PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux$
> LustreError: 2092:0:( socklnd.c: 
> 2466:ksocknal_enumerate_interfaces()) Can't find any usable interfaces
> LustreError: 105-4: Error -100 starting up LNI tcp
> LustreError: 2092:0:(events.c:654:ptlrpc_init_portals()) network  
> initialisation failed
> LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11  
> reading connection request from 132.66.17$
> LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11  
> reading connection request from 132.66.17$
> audit(1197995576.670:57): avc: denied { rawip_send } for pid=2711  
> comm="acceptor_988" saddr=132.66.17$
> audit(1197995672.933:58): avc: denied { rawip_recv } for  
> saddr=132.66.176.215 src=1023 daddr=132.66.1$
> audit(1197995673.143:59): avc: denied { rawip_recv } for  
> saddr=132.66.176.215 src=1023 daddr=132.66.1$
> audit(1197995673.563:60): avc: denied { rawip_recv } for saddr=  
> 132.66.176.215 src=1023 daddr=132.66.1$
> audit(1197995674.403:61): avc: denied { rawip_recv } for  
> saddr=132.66.176.215 src=1023 daddr=132.66.1$
> ******************************************************************************************************88
> getenforce:
>
> root at x-math20 ~]# getenforce
> Enforcing
>
> thanks Avi
>
>
>
> On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
> Can you run dmesg and send me any lustre related errors? Also what's  
> the output of "getenforce"?
>
> -Aaron
>
> On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote:
>
>> no, that don't work also :-( ..
>> thanks for answering so fast
>> Avi
>>
>> On 1/2/08, Aaron Knister <aaron at iges.org > wrote:
>> That all looks ok. From x-math20 could you run "lctl ping  
>> 132.66.176.212 at tcp0"?
>>
>> On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote:
>>
>>> Hi, I get this:
>>> ***************************************************************************
>>> [root at x-math20 ~]# lctl list_nids
>>> 132.66.176.211 at tcp
>>> [root at x-math20 ~]# ifconfig -a
>>> eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF
>>> inet addr:132.66.176.211 Bcast: 132.66.255.255 Mask: 255.255.0.0
>>> inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB)
>>>
>>> lo Link encap:Local Loopback
>>> inet addr: 127.0.0.1 Mask: 255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>> RX packets:8180 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB)
>>>
>>> sit0 Link encap:IPv6-in-IPv4
>>> NOARP MTU:1480 Metric:1
>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:0 ( 0.0 b) TX bytes:0 (0.0 b)
>>>
>>> [root at x-math20 ~]# cat /etc/modprobe.conf
>>> alias eth0 e100
>>> alias usb-controller uhci-hcd
>>> alias scsi_hostadapter ata_piix
>>> alias lustre llite
>>> options lnet networks=tcp0
>>> [ root at x-math20 ~]#
>>>
>>> ***********************************************************************************************************8
>>>
>>> On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
>>> On the host x-math20 could you run an "lctl list_nids" and also an  
>>> "ifconfig -a". I want to see if lnet is listening on the correct  
>>> interface. Oh could you also post the contents of your /etc/ 
>>> modprobe.conf.
>>>
>>> Thanks!
>>>
>>> -Aaron
>>>
>>> On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:
>>>
>>>> Hello to every one and happy new year..
>>>> I think I have reduce my problem to this: lctl ping  
>>>> 132.66.176.211 at tcp0 don't work for me for some strange reason
>>>> as you can see:
>>>> ***********************************************************************************
>>>> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
>>>> failed to ping 132.66.176.211 at tcp: Input/output error
>>>> [root at x-math20 ~]# ping 132.66.176.211
>>>> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data.
>>>> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
>>>> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
>>>> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
>>>> --- 132.66.176.211 ping statistics ---
>>>> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
>>>> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
>>>> [root at x-math20 ~]#
>>>> *****************************************************************************************
>>>>
>>>>
>>>> On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote:
>>>> Hi,
>>>> here is the "iptables -L  " results:
>>>>
>>>>  NODE 1 132.66.176.212
>>>> Scientific Linux CERN SLC release 4.6 (Beryllium)
>>>> root at 132.66.176.212's password:
>>>> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il
>>>> [root at localhost ~]#
>>>> [root at localhost ~]#
>>>> [root at localhost ~]# iptables -L
>>>> Chain INPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> Chain FORWARD (policy ACCEPT)
>>>> target     prot opt source               destination
>>>>
>>>> Chain OUTPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> ************************************************************************************************
>>>>  MDT 132.66.176.211
>>>>
>>>> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
>>>> [root at x-math20 ~]# iptables -L
>>>> Chain INPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> Chain FORWARD (policy ACCEPT)
>>>> target     prot opt source               destination
>>>>
>>>> Chain OUTPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> *************************************************************************
>>>>
>>>> NODE 2 132.66.176.215
>>>> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
>>>> [root at x-mathr11 ~]# iptables -L
>>>>
>>>> Chain INPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>>>> Chain FORWARD (policy ACCEPT)
>>>> target     prot opt source               destination
>>>> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>>>>
>>>> Chain OUTPUT (policy ACCEPT)
>>>> target     prot opt source               destination
>>>>
>>>> Chain RH-Firewall-1-INPUT (2 references)
>>>> target     prot opt source               destination
>>>> ACCEPT     all  --  anywhere             anywhere
>>>> ACCEPT     icmp --  anywhere             anywhere            icmp  
>>>> any
>>>> ACCEPT     ipv6-crypt--  anywhere             anywhere
>>>> ACCEPT     ipv6-auth--  anywhere             anywhere
>>>> ACCEPT     udp  --  anywhere             224.0.0.251         udp  
>>>> dpt:5353
>>>> ACCEPT     udp  --  anywhere             anywhere            udp  
>>>> dpt:ipp
>>>> ACCEPT     all  --  anywhere             anywhere             
>>>> state RELATED,ESTAB
>>>> LISHED
>>>> ACCEPT     tcp  --  anywhere             anywhere             
>>>> state NEW tcp dpts:
>>>> 30000:30101
>>>> ACCEPT     tcp  --  anywhere             anywhere             
>>>> state NEW tcp dpt:s
>>>> sh
>>>> ACCEPT     udp  --  anywhere             anywhere             
>>>> state NEW udp dpt:a
>>>> fs3-callback
>>>> REJECT     all  --  anywhere             anywhere             
>>>> reject-with icmp-ho
>>>> st-prohibited
>>>> [root at x-mathr11 ~]#
>>>>
>>>> ************************************************************
>>>> one more thing....
>>>> Do you use TCP protocol? or do you use UDP?
>>>>
>>>> Regards Avi,
>>>> P.S I think a beginning of a beautiful friendship.. :-)
>>>>
>>>>
>>>>
>>>> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote:
>>>> That sounds like quite a task! Could you show me the contents of  
>>>> your
>>>> firewall rules on the systems mentioned below? (iptables -L) on  
>>>> each.
>>>> That would help to diagnose the problem further.
>>>>
>>>> -Aaron
>>>>
>>>> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>>>>
>>>> > Hi Aaron and thank you for you fast answwers.
>>>> > We are working (Avi,Meny and me) on the israeli GRID and we  
>>>> need to
>>>> > create a single huge file system for this GRID.
>>>> >     cheers
>>>> >          Yan
>>>> >
>>>> > ________________________________
>>>> >
>>>> > From: Aaron Knister [mailto: aaron at iges.org]
>>>> > Sent: Sun 12/23/2007 8:27 PM
>>>> > To: Avi Gershon
>>>> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe
>>>> > Subject: Re: [Lustre-discuss] help needed.
>>>> >
>>>> >
>>>> > Can you check the firewall on each of those machines ( iptables  
>>>> -L )
>>>> > and paste that here. Also, is this network dedicated to Lustre?
>>>> > Lustre can easily saturate a network interface under load to the
>>>> > point it becomes difficult to login to a node if it only has one
>>>> > interface. I'd recommend using a different interface if you can.
>>>> >
>>>> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
>>>> >
>>>> >
>>>> >       node 1 132.66.176.212 < http://132.66.176.212/>
>>>> >       node 2 132.66.176.215 < http://132.66.176.215/>
>>>> >
>>>> >       [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/ 
>>>>  >
>>>> >       root at 132.66.176.215's password:
>>>> >       ssh(21957) Permission denied, please try again.
>>>> >       root at 132.66.176.215 's password:
>>>> >       Last login: Sun Dec 23 14:32:51 2007 from x- 
>>>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>>>> > >
>>>> >       [root at x-mathr11 ~]#  lctl ping 132.66.176.211 at tcp0
>>>> >       failed to ping 132.66.176.211 at tcp: Input/output error
>>>> >       [root at x-mathr11 ~]#  lctl list_nids
>>>> >       132.66.176.215 at tcp
>>>> >       [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ 
>>>> >
>>>> >       The authenticity of host ' 132.66.176.212 <http://132.66.176.212/ 
>>>> >
>>>> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be  
>>>> established.
>>>> >       RSA1 key fingerprint is  
>>>> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
>>>> > 7e:74.
>>>> >       Are you sure you want to continue connecting (yes/no)? yes
>>>> >       ssh(11526) Warning: Permanently added ' 132.66.176.212 < http://132.66.176.212/
>>>> > > ' (RSA1) to the list of kno
>>>> >       wn hosts.
>>>> >       root at 132.66.176.212's password:
>>>> >       Last login: Sun Dec 23 15:24:41 2007 from x- 
>>>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>>>> > >
>>>> >       [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
>>>> >       failed to ping 132.66.176.211 at tcp: Input/output error
>>>> >       [root at localhost ~]# lctl list_nids
>>>> >       132.66.176.212 at tcp
>>>> >       [root at localhost ~]#
>>>> >
>>>> >
>>>> >       thanks for helping!!
>>>> >       Avi
>>>> >
>>>> >
>>>> >       On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>  
>>>> wrote:
>>>> >
>>>> >
>>>> >               On the oss can you ping the mds/mgs using this  
>>>> command--
>>>> >
>>>> >               lctl ping 132.66.176.211 at tcp0
>>>> >
>>>> >               If it doesn't ping, list the nids on each node by  
>>>> running
>>>> >
>>>> >               lctl list_nids
>>>> >
>>>> >               and tell me what comes back.
>>>> >
>>>> >               -Aaron
>>>> >
>>>> >
>>>> >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
>>>> >
>>>> >
>>>> >                       HI I could use some help.
>>>> >                       I installed lustre on 3 computers
>>>> >                        mdt/mgs :
>>>> >
>>>> >
>>>> >  
>>>> ************************************************************************************8
>>>> >                       [root at x-math20 ~]#mkfs.lustre --reformat  
>>>> --fsname spfs --mdt --
>>>> > mgs /dev/hdb
>>>> >
>>>> >                          Permanent disk data:
>>>> >                       Target:     spfs-MDTffff
>>>> >                       Index:      unassigned
>>>> >                       Lustre FS:  spfs
>>>> >                       Mount type: ldiskfs
>>>> >                       Flags:      0x75
>>>> >                                     (MDT MGS needs_index  
>>>> first_time update )
>>>> >                       Persistent mount opts: errors=remount- 
>>>> ro,iopen_nopriv,user_xattr
>>>> >                       Parameters:
>>>> >
>>>> >                       device size = 19092MB
>>>> >                       formatting backing filesystem ldiskfs on / 
>>>> dev/hdb
>>>> >                               target name  spfs-MDTffff
>>>> >                               4k blocks     0
>>>> >                               options        -J size=400 -i  
>>>> 4096 -I 512 -q -O dir_index
>>>> > -F
>>>> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
>>>> MDTffff  -J size=400 -i
>>>> > 4096 -I 512 -q -O dir_index -F /dev/hdb
>>>> >                       Writing CONFIGS/mountdata
>>>> >                       [ root at x-math20 ~]# df
>>>> >                       Filesystem           1K-blocks      Used  
>>>> Available Use% Mounted on
>>>> >                       /dev/hda1             19228276   4855244   
>>>> 13396284  27% /
>>>> >                       none                    127432          
>>>> 0    127432   0% /dev/shm
>>>> >                       /dev/hdb              17105436    455152   
>>>> 15672728   3% /mnt/test/
>>>> > mdt
>>>> >                       [root at x-math20 ~]# cat /proc/fs/lustre/ 
>>>> devices
>>>> >                         0 UP mgs MGS MGS 5
>>>> >                         1 UP mgc MGC132.66.176.211 at tcp
>>>> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
>>>> >                         2 UP mdt MDS MDS_uuid 3
>>>> >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
>>>> >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
>>>> >                       [ root at x-math20 ~]#
>>>> >                        
>>>> *************************************************************end
>>>> > mdt******************************8
>>>> >                       so you can see that the MGS is up
>>>> >                       ond on the ost's I get an error!! plz  
>>>> help...
>>>> >
>>>> >                       ost:
>>>> >
>>>> >  
>>>> **********************************************************************
>>>> >                       [ root at x-mathr11 ~]# mkfs.lustre -- 
>>>> reformat --fsname spfs --ost --
>>>> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
>>>> >
>>>> >                          Permanent disk data:
>>>> >                       Target:     spfs-OSTffff
>>>> >                       Index:      unassigned
>>>> >                       Lustre FS:  spfs
>>>> >                       Mount type: ldiskfs
>>>> >                       Flags:      0x72
>>>> >                                     (OST needs_index first_time  
>>>> update )
>>>> >                       Persistent mount opts: errors=remount- 
>>>> ro,extents,mballoc
>>>> >                       Parameters: mgsnode=132.66.176.211 at tcp
>>>> >
>>>> >                       device size = 19594MB
>>>> >                       formatting backing filesystem ldiskfs on / 
>>>> dev/hdb1
>>>> >                               target name  spfs-OSTffff
>>>> >                               4k blocks     0
>>>> >                               options        -J size=400 -i  
>>>> 16384 -I 256 -q -O
>>>> > dir_index -F
>>>> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
>>>> OSTffff  -J size=400 -i
>>>> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
>>>> >                       Writing CONFIGS/mountdata
>>>> >                       [ root at x-mathr11 ~]# /CONFIGS/mountdata
>>>> >                       -bash: /CONFIGS/mountdata: No such file  
>>>> or directory
>>>> >                       [root at x-mathr11 ~]# mount -t lustre /dev/ 
>>>> hdb1 /mnt/test/ost1
>>>> >                       mount.lustre: mount /dev/hdb1 at /mnt/ 
>>>> test/ost1 failed: Input/
>>>> > output error
>>>> >                       Is the MGS running?
>>>> >                        
>>>> ***********************************************end
>>>> > ost********************************
>>>> >
>>>> >                       can any one point out the problem?
>>>> >                       thanks Avi.
>>>> >
>>>> >
>>>> >
>>>> >                        
>>>> _______________________________________________
>>>> >                       Lustre-discuss mailing list
>>>> >                       Lustre-discuss at clusterfs.com
>>>> >                       https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >               Aaron Knister
>>>> >               Associate Systems Administrator/Web Designer
>>>> >               Center for Research on Environment and Water
>>>> >
>>>> >               (301) 595-7001
>>>> >               aaron at iges.org
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Aaron Knister
>>>> > Associate Systems Administrator/Web Designer
>>>> > Center for Research on Environment and Water
>>>> >
>>>> > (301) 595-7001
>>>> > aaron at iges.org
>>>> >
>>>> >
>>>> >
>>>>
>>>> Aaron Knister
>>>> Associate Systems Administrator/Web Designer
>>>> Center for Research on Environment and Water
>>>>
>>>> (301) 595-7001
>>>> aaron at iges.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at clusterfs.com
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>
>>> Aaron Knister
>>> Associate Systems Analyst
>>> Center for Ocean-Land-Atmosphere Studies
>>>
>>> (301) 595-7000
>>> aaron at iges.org
>>>
>>>
>>>
>>>
>>>
>>
>> Aaron Knister
>> Associate Systems Analyst
>> Center for Ocean-Land-Atmosphere Studies
>>
>> (301) 595-7000
>> aaron at iges.org
>>
>>
>>
>>
>>
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>
>

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080103/ae27151c/attachment.htm>


More information about the lustre-discuss mailing list