[Lustre-discuss] help needed.

Avi Gershon gershonavi at gmail.com
Thu Jan 3 01:22:57 PST 2008


Hi,
dmesg:
***********************************************************************************************
Lustre: 2092:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192
Lustre: OBD class driver, info at clusterfs.com
Lustre Version: 1.6.3
Build Version:
1.6.3-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux$
LustreError: 2092:0:(socklnd.c:2466:ksocknal_enumerate_interfaces()) Can't
find any usable interfaces
LustreError: 105-4: Error -100 starting up LNI tcp
LustreError: 2092:0:(events.c:654:ptlrpc_init_portals()) network
initialisation failed
LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading
connection request from 132.66.17$
LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading
connection request from 132.66.17$
audit(1197995576.670:57): avc: denied { rawip_send } for pid=2711
comm="acceptor_988" saddr=132.66.17$
audit(1197995672.933:58): avc: denied { rawip_recv } for saddr=
132.66.176.215 src=1023 daddr=132.66.1$
audit(1197995673.143:59): avc: denied { rawip_recv } for saddr=
132.66.176.215 src=1023 daddr=132.66.1$
audit(1197995673.563:60): avc: denied { rawip_recv } for saddr=
132.66.176.215 src=1023 daddr=132.66.1$
audit(1197995674.403:61): avc: denied { rawip_recv } for saddr=
132.66.176.215 src=1023 daddr=132.66.1$
******************************************************************************************************88
getenforce:

root at x-math20 ~]# getenforce
Enforcing

thanks Avi



On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
>
> Can you run dmesg and send me any lustre related errors? Also what's the
> output of "getenforce"?
>
> -Aaron
>
> On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote:
>
> no, that don't work also :-( ..
> thanks for answering so fast
> Avi
>
> On 1/2/08, Aaron Knister <aaron at iges.org > wrote:
> >
> > That all looks ok. From x-math20 could you run "lctl ping
> > 132.66.176.212 at tcp0"?
> >
> > On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote:
> >
> > *Hi, I get this:*
> >
> > ***************************************************************************
> > [root at x-math20 ~]# lctl list_nids
> > 132.66.176.211 at tcp
> > [root at x-math20 ~]# ifconfig -a
> > eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF
> > inet addr:132.66.176.211 Bcast:132.66.255.255 Mask: 255.255.0.0
> > inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB)
> >
> > lo Link encap:Local Loopback
> > inet addr: 127.0.0.1 Mask:255.0.0.0
> > inet6 addr: ::1/128 Scope:Host
> > UP LOOPBACK RUNNING MTU:16436 Metric:1
> > RX packets:8180 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:0
> > RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB)
> >
> > sit0 Link encap:IPv6-in-IPv4
> > NOARP MTU:1480 Metric:1
> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:0
> > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> >
> > [root at x-math20 ~]# cat /etc/modprobe.conf
> > alias eth0 e100
> > alias usb-controller uhci-hcd
> > alias scsi_hostadapter ata_piix
> > alias lustre llite
> > options lnet networks=tcp0
> > [root at x-math20 ~]#
> >
> > ***********************************************************************************************************8
> >
> >
> > On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
> > >
> > > On the host x-math20 could you run an "lctl list_nids" and also an
> > > "ifconfig -a". I want to see if lnet is listening on the correct interface.
> > > Oh could you also post the contents of your /etc/modprobe.conf.
> > >
> > > Thanks!
> > >
> > > -Aaron
> > >
> > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:
> > >
> > > Hello to every one and happy new year..
> > > I think I have reduce my problem to this: lctl ping
> > > 132.66.176.211 at tcp0 don't work for me for some strange reason
> > > as you can see:
> > > ***********************************************************************************
> > >
> > > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
> > > failed to ping 132.66.176.211 at tcp: Input/output error
> > > [root at x-math20 ~]# ping 132.66.176.211
> > > PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data.
> > > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
> > > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
> > > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
> > > --- 132.66.176.211 ping statistics ---
> > > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
> > > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
> > > [root at x-math20 ~]#
> > > *****************************************************************************************
> > >
> > >
> > >
> > > On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote:
> > > >
> > > > Hi,
> > > > here is the "iptables -L  " results:
> > > >
> > > >  NODE 1 132.66.176.212 <root at 132.66.176.212>
> > > > Scientific Linux CERN SLC release 4.6 (Beryllium)
> > > > root at 132.66.176.212's password:
> > > > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il
> > > > [root at localhost ~]#
> > > > [root at localhost ~]#
> > > > [root at localhost ~]# iptables -L
> > > > Chain INPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > >
> > > > Chain FORWARD (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > Chain OUTPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > ************************************************************************************************
> > > >
> > > >  MDT 132.66.176.211
> > > >
> > > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
> > > > [root at x-math20 ~]# iptables -L
> > > > Chain INPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > >
> > > > Chain FORWARD (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > Chain OUTPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > >
> > > > *************************************************************************
> > > >
> > > > NODE 2 132.66.176.215 <root at 132.66.176.215>
> > > > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
> > > > [root at x-mathr11 ~]# iptables -L
> > > > Chain INPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > RH-Firewall-1-INPUT  all  --  anywhere             anywhere
> > > >
> > > > Chain FORWARD (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > RH-Firewall-1-INPUT  all  --  anywhere             anywhere
> > > >
> > > > Chain OUTPUT (policy ACCEPT)
> > > > target     prot opt source               destination
> > > > Chain RH-Firewall-1-INPUT (2 references)
> > > > target     prot opt source               destination
> > > > ACCEPT     all  --  anywhere             anywhere
> > > > ACCEPT     icmp --  anywhere             anywhere            icmp
> > > > any
> > > > ACCEPT     ipv6-crypt--  anywhere             anywhere
> > > > ACCEPT     ipv6-auth--  anywhere             anywhere
> > > > ACCEPT     udp  --  anywhere             224.0.0.251         udp
> > > > dpt:5353
> > > > ACCEPT     udp  --  anywhere             anywhere            udp
> > > > dpt:ipp
> > > > ACCEPT     all  --  anywhere             anywhere            state
> > > > RELATED,ESTAB
> > > > LISHED
> > > > ACCEPT     tcp  --  anywhere             anywhere            state
> > > > NEW tcp dpts:
> > > > 30000:30101
> > > > ACCEPT     tcp  --  anywhere             anywhere            state
> > > > NEW tcp dpt:s
> > > > sh
> > > > ACCEPT     udp  --  anywhere             anywhere            state
> > > > NEW udp dpt:a
> > > > fs3-callback
> > > > REJECT     all  --  anywhere             anywhere
> > > > reject-with icmp-ho
> > > > st-prohibited
> > > > [root at x-mathr11 ~]#
> > > >
> > > > ************************************************************
> > > > one more thing....
> > > > Do you use TCP protocol? or do you use UDP?
> > > >
> > > > Regards Avi,
> > > > P.S I think a beginning of a beautiful friendship.. :-)
> > > >
> > > >
> > > >
> > > > On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote:
> > > >
> > > > > That sounds like quite a task! Could you show me the contents of
> > > > > your
> > > > > firewall rules on the systems mentioned below? (iptables -L) on
> > > > > each.
> > > > > That would help to diagnose the problem further.
> > > > >
> > > > > -Aaron
> > > > >
> > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
> > > > >
> > > > > > Hi Aaron and thank you for you fast answwers.
> > > > > > We are working (Avi,Meny and me) on the israeli GRID and we need
> > > > > to
> > > > > > create a single huge file system for this GRID.
> > > > > >     cheers
> > > > > >          Yan
> > > > > >
> > > > > > ________________________________
> > > > > >
> > > > > > From: Aaron Knister [mailto: aaron at iges.org]
> > > > > > Sent: Sun 12/23/2007 8:27 PM
> > > > > > To: Avi Gershon
> > > > > > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe
> > > > > > Subject: Re: [Lustre-discuss] help needed.
> > > > > >
> > > > > >
> > > > > > Can you check the firewall on each of those machines ( iptables
> > > > > -L )
> > > > > > and paste that here. Also, is this network dedicated to Lustre?
> > > > > > Lustre can easily saturate a network interface under load to the
> > > > >
> > > > > > point it becomes difficult to login to a node if it only has one
> > > > > > interface. I'd recommend using a different interface if you can.
> > > > >
> > > > > >
> > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
> > > > > >
> > > > > >
> > > > > >       node 1 132.66.176.212 < http://132.66.176.212/>
> > > > > >       node 2 132.66.176.215 < http://132.66.176.215/>
> > > > > >
> > > > > >       [root at x-math20 ~]# ssh 132.66.176.215 <http://132.66.176.215/>
> > > > > >       root at 132.66.176.215's password:
> > > > > >       ssh(21957) Permission denied, please try again.
> > > > > >       root at 132.66.176.215 's password:
> > > > > >       Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il
> > > > > <http://x-math20.tau.ac.il/
> > > > > > >
> > > > > >       [root at x-mathr11 ~]#  lctl ping 132.66.176.211 at tcp0
> > > > > >       failed to ping 132.66.176.211 at tcp: Input/output error
> > > > > >       [root at x-mathr11 ~]#  lctl list_nids
> > > > > >       132.66.176.215 at tcp
> > > > > >       [root at x-mathr11 ~]# ssh 132.66.176.212 <
> > > > > http://132.66.176.212/>
> > > > > >       The authenticity of host ' 132.66.176.212 <
> > > > > http://132.66.176.212/>
> > > > > > ( 132.66.176.212 <http://132.66.176.212/> )' can't be
> > > > > established.
> > > > > >       RSA1 key fingerprint is
> > > > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
> > > > > > 7e:74.
> > > > > >       Are you sure you want to continue connecting (yes/no)? yes
> > > > > >       ssh(11526) Warning: Permanently added ' 132.66.176.212 <http://132.66.176.212/
> > > > > > > ' (RSA1) to the list of kno
> > > > > >       wn hosts.
> > > > > >       root at 132.66.176.212's password:
> > > > > >       Last login: Sun Dec 23 15:24:41 2007 from
> > > > > x-math20.tau.ac.il <http://x-math20.tau.ac.il/
> > > > > > >
> > > > > >       [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
> > > > > >       failed to ping 132.66.176.211 at tcp: Input/output error
> > > > > >       [root at localhost ~]# lctl list_nids
> > > > > >       132.66.176.212 at tcp
> > > > > >       [root at localhost ~]#
> > > > > >
> > > > > >
> > > > > >       thanks for helping!!
> > > > > >       Avi
> > > > > >
> > > > > >
> > > > > >       On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >               On the oss can you ping the mds/mgs using this
> > > > > command--
> > > > > >
> > > > > >               lctl ping 132.66.176.211 at tcp0
> > > > > >
> > > > > >               If it doesn't ping, list the nids on each node by
> > > > > running
> > > > > >
> > > > > >               lctl list_nids
> > > > > >
> > > > > >               and tell me what comes back.
> > > > > >
> > > > > >               -Aaron
> > > > > >
> > > > > >
> > > > > >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
> > > > > >
> > > > > >
> > > > > >                       HI I could use some help.
> > > > > >                       I installed lustre on 3 computers
> > > > > >                        mdt/mgs :
> > > > > >
> > > > > >
> > > > > >
> > > > > ************************************************************************************8
> > > > >
> > > > > >                       [root at x-math20 ~]#mkfs.lustre --reformat
> > > > > --fsname spfs --mdt --
> > > > > > mgs /dev/hdb
> > > > > >
> > > > > >                          Permanent disk data:
> > > > > >                       Target:     spfs-MDTffff
> > > > > >                       Index:      unassigned
> > > > > >                       Lustre FS:  spfs
> > > > > >                       Mount type: ldiskfs
> > > > > >                       Flags:      0x75
> > > > > >                                     (MDT MGS needs_index
> > > > > first_time update )
> > > > > >                       Persistent mount opts:
> > > > > errors=remount-ro,iopen_nopriv,user_xattr
> > > > > >                       Parameters:
> > > > > >
> > > > > >                       device size = 19092MB
> > > > > >                       formatting backing filesystem ldiskfs on
> > > > > /dev/hdb
> > > > > >                               target name  spfs-MDTffff
> > > > > >                               4k blocks     0
> > > > > >                               options        -J size=400 -i 4096
> > > > > -I 512 -q -O dir_index
> > > > > > -F
> > > > > >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L
> > > > > spfs-MDTffff  -J size=400 -i
> > > > > > 4096 -I 512 -q -O dir_index -F /dev/hdb
> > > > > >                       Writing CONFIGS/mountdata
> > > > > >                       [ root at x-math20 ~]# df
> > > > > >                       Filesystem           1K-blocks      Used
> > > > > Available Use% Mounted on
> > > > > >                       /dev/hda1             19228276   4855244
> > > > >  13396284  27% /
> > > > > >                       none                    127432         0
> > > > >  127432   0% /dev/shm
> > > > > >                       /dev/hdb              17105436    455152
> > > > >  15672728   3% /mnt/test/
> > > > > > mdt
> > > > > >                       [root at x-math20 ~]# cat
> > > > > /proc/fs/lustre/devices
> > > > > >                         0 UP mgs MGS MGS 5
> > > > > >                         1 UP mgc MGC132.66.176.211 at tcp
> > > > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5
> > > > > >                         2 UP mdt MDS MDS_uuid 3
> > > > > >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
> > > > > >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID
> > > > > 3
> > > > > >                       [ root at x-math20 ~]#
> > > > > >
> > > > > *************************************************************end
> > > > > > mdt******************************8
> > > > > >                       so you can see that the MGS is up
> > > > > >                       ond on the ost's I get an error!! plz
> > > > > help...
> > > > > >
> > > > > >                       ost:
> > > > > >
> > > > > >
> > > > > **********************************************************************
> > > > > >                       [ root at x-mathr11 ~]# mkfs.lustre--reformat --fsname spfs --ost --
> > > > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
> > > > > >
> > > > > >                          Permanent disk data:
> > > > > >                       Target:     spfs-OSTffff
> > > > > >                       Index:      unassigned
> > > > > >                       Lustre FS:  spfs
> > > > > >                       Mount type: ldiskfs
> > > > > >                       Flags:      0x72
> > > > > >                                     (OST needs_index first_time
> > > > > update )
> > > > > >                       Persistent mount opts:
> > > > > errors=remount-ro,extents,mballoc
> > > > > >                       Parameters: mgsnode=132.66.176.211 at tcp
> > > > > >
> > > > > >                       device size = 19594MB
> > > > > >                       formatting backing filesystem ldiskfs on
> > > > > /dev/hdb1
> > > > > >                               target name  spfs-OSTffff
> > > > > >                               4k blocks     0
> > > > > >                               options        -J size=400 -i
> > > > > 16384 -I 256 -q -O
> > > > > > dir_index -F
> > > > > >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L
> > > > > spfs-OSTffff  -J size=400 -i
> > > > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1
> > > > > >                       Writing CONFIGS/mountdata
> > > > > >                       [ root at x-mathr11 ~]# /CONFIGS/mountdata
> > > > > >                       -bash: /CONFIGS/mountdata: No such file or
> > > > > directory
> > > > > >                       [root at x-mathr11 ~]# mount -t lustre
> > > > > /dev/hdb1 /mnt/test/ost1
> > > > > >                       mount.lustre: mount /dev/hdb1 at
> > > > > /mnt/test/ost1 failed: Input/
> > > > > > output error
> > > > > >                       Is the MGS running?
> > > > > >
> > > > > ***********************************************end
> > > > > > ost********************************
> > > > > >
> > > > > >                       can any one point out the problem?
> > > > > >                       thanks Avi.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > >                       Lustre-discuss mailing list
> > > > > >                       Lustre-discuss at clusterfs.com
> > > > > >                       https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >               Aaron Knister
> > > > > >               Associate Systems Administrator/Web Designer
> > > > > >               Center for Research on Environment and Water
> > > > > >
> > > > > >               (301) 595-7001
> > > > > >               aaron at iges.org
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Aaron Knister
> > > > > > Associate Systems Administrator/Web Designer
> > > > > > Center for Research on Environment and Water
> > > > > >
> > > > > > (301) 595-7001
> > > > > > aaron at iges.org
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > Aaron Knister
> > > > > Associate Systems Administrator/Web Designer
> > > > > Center for Research on Environment and Water
> > > > >
> > > > > (301) 595-7001
> > > > > aaron at iges.org
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > >
> > >
> > > Aaron Knister
> > > Associate Systems Analyst
> > > Center for Ocean-Land-Atmosphere Studies
> > >
> > > (301) 595-7000
> > > aaron at iges.org
> > >
> > >
> > >
> > >
> > >
> >
> > Aaron Knister
> > Associate Systems Analyst
> > Center for Ocean-Land-Atmosphere Studies
> >
> > (301) 595-7000
> > aaron at iges.org
> >
> >
> >
> >
> >
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080103/24408cdf/attachment.htm>


More information about the lustre-discuss mailing list