[Lustre-discuss] help needed.

Aaron Knister aaron at iges.org
Wed Jan 2 05:22:38 PST 2008


On the host x-math20 could you run an "lctl list_nids" and also an  
"ifconfig -a". I want to see if lnet is listening on the correct  
interface. Oh could you also post the contents of your /etc/ 
modprobe.conf.

Thanks!

-Aaron

On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:

> Hello to every one and happy new year..
> I think I have reduce my problem to this: lctl ping  
> 132.66.176.211 at tcp0 don't work for me for some strange reason
> as you can see:
> ***********************************************************************************
> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
> failed to ping 132.66.176.211 at tcp: Input/output error
> [root at x-math20 ~]# ping 132.66.176.211
> PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data.
> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
> --- 132.66.176.211 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
> [root at x-math20 ~]#
> *****************************************************************************************
>
>
> On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote:
> Hi,
> here is the "iptables -L  " results:
>
>  NODE 1 132.66.176.212
> Scientific Linux CERN SLC release 4.6 (Beryllium)
> root at 132.66.176.212's password:
> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il
> [root at localhost ~]#
> [root at localhost ~]#
> [root at localhost ~]# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> ************************************************************************************************
>  MDT 132.66.176.211
>
> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
> [root at x-math20 ~]# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> *************************************************************************
>
> NODE 2 132.66.176.215
> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
> [root at x-mathr11 ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain RH-Firewall-1-INPUT (2 references)
> target     prot opt source               destination
> ACCEPT     all  --  anywhere             anywhere
> ACCEPT     icmp --  anywhere             anywhere            icmp any
> ACCEPT     ipv6-crypt--  anywhere             anywhere
> ACCEPT     ipv6-auth--  anywhere             anywhere
> ACCEPT     udp  --  anywhere             224.0.0.251         udp dpt: 
> 5353
> ACCEPT     udp  --  anywhere             anywhere            udp  
> dpt:ipp
> ACCEPT     all  --  anywhere             anywhere            state  
> RELATED,ESTAB
> LISHED
> ACCEPT     tcp  --  anywhere             anywhere            state  
> NEW tcp dpts:
> 30000:30101
> ACCEPT     tcp  --  anywhere             anywhere            state  
> NEW tcp dpt:s
> sh
> ACCEPT     udp  --  anywhere             anywhere            state  
> NEW udp dpt:a
> fs3-callback
> REJECT     all  --  anywhere             anywhere            reject- 
> with icmp-ho
> st-prohibited
> [root at x-mathr11 ~]#
>
> ************************************************************
> one more thing....
> Do you use TCP protocol? or do you use UDP?
>
> Regards Avi,
> P.S I think a beginning of a beautiful friendship.. :-)
>
>
>
> On Dec 24, 2007 5:29 PM, Aaron Knister <aaron at iges.org> wrote:
> That sounds like quite a task! Could you show me the contents of your
> firewall rules on the systems mentioned below? (iptables -L) on each.
> That would help to diagnose the problem further.
>
> -Aaron
>
> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>
> > Hi Aaron and thank you for you fast answwers.
> > We are working (Avi,Meny and me) on the israeli GRID and we need to
> > create a single huge file system for this GRID.
> >     cheers
> >          Yan
> >
> > ________________________________
> >
> > From: Aaron Knister [mailto:aaron at iges.org]
> > Sent: Sun 12/23/2007 8:27 PM
> > To: Avi Gershon
> > Cc: lustre-discuss at clusterfs.com; Yan Benhammou; Meny Ben moshe
> > Subject: Re: [Lustre-discuss] help needed.
> >
> >
> > Can you check the firewall on each of those machines ( iptables -L )
> > and paste that here. Also, is this network dedicated to Lustre?
> > Lustre can easily saturate a network interface under load to the
> > point it becomes difficult to login to a node if it only has one
> > interface. I'd recommend using a different interface if you can.
> >
> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
> >
> >
> >       node 1 132.66.176.212 < http://132.66.176.212/>
> >       node 2 132.66.176.215 < http://132.66.176.215/>
> >
> >       [root at x-math20 ~]# ssh 132.66.176.215 < http:// 
> 132.66.176.215/ >
> >       root at 132.66.176.215's password:
> >       ssh(21957) Permission denied, please try again.
> >       root at 132.66.176.215 's password:
> >       Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il  
> <http://x-math20.tau.ac.il/
> > >
> >       [root at x-mathr11 ~]#  lctl ping 132.66.176.211 at tcp0
> >       failed to ping 132.66.176.211 at tcp: Input/output error
> >       [root at x-mathr11 ~]#  lctl list_nids
> >       132.66.176.215 at tcp
> >       [root at x-mathr11 ~]# ssh 132.66.176.212 <http:// 
> 132.66.176.212/>
> >       The authenticity of host ' 132.66.176.212 <http://132.66.176.212/ 
> >
> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established.
> >       RSA1 key fingerprint is  
> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
> > 7e:74.
> >       Are you sure you want to continue connecting (yes/no)? yes
> >       ssh(11526) Warning: Permanently added ' 132.66.176.212 < http://132.66.176.212/
> > > ' (RSA1) to the list of kno
> >       wn hosts.
> >       root at 132.66.176.212's password:
> >       Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il  
> <http://x-math20.tau.ac.il/
> > >
> >       [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
> >       failed to ping 132.66.176.211 at tcp: Input/output error
> >       [root at localhost ~]# lctl list_nids
> >       132.66.176.212 at tcp
> >       [root at localhost ~]#
> >
> >
> >       thanks for helping!!
> >       Avi
> >
> >
> >       On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>  
> wrote:
> >
> >
> >               On the oss can you ping the mds/mgs using this  
> command--
> >
> >               lctl ping 132.66.176.211 at tcp0
> >
> >               If it doesn't ping, list the nids on each node by  
> running
> >
> >               lctl list_nids
> >
> >               and tell me what comes back.
> >
> >               -Aaron
> >
> >
> >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
> >
> >
> >                       HI I could use some help.
> >                       I installed lustre on 3 computers
> >                        mdt/mgs :
> >
> >
> >  
> ************************************************************************************8
> >                       [root at x-math20 ~]#mkfs.lustre --reformat -- 
> fsname spfs --mdt --
> > mgs /dev/hdb
> >
> >                          Permanent disk data:
> >                       Target:     spfs-MDTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x75
> >                                     (MDT MGS needs_index  
> first_time update )
> >                       Persistent mount opts: errors=remount- 
> ro,iopen_nopriv,user_xattr
> >                       Parameters:
> >
> >                       device size = 19092MB
> >                       formatting backing filesystem ldiskfs on / 
> dev/hdb
> >                               target name  spfs-MDTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 4096 - 
> I 512 -q -O dir_index
> > -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
> MDTffff  -J size=400 -i
> > 4096 -I 512 -q -O dir_index -F /dev/hdb
> >                       Writing CONFIGS/mountdata
> >                       [ root at x-math20 ~]# df
> >                       Filesystem           1K-blocks      Used  
> Available Use% Mounted on
> >                       /dev/hda1             19228276   4855244   
> 13396284  27% /
> >                       none                    127432         0     
> 127432   0% /dev/shm
> >                       /dev/hdb              17105436    455152   
> 15672728   3% /mnt/test/
> > mdt
> >                       [root at x-math20 ~]# cat /proc/fs/lustre/devices
> >                         0 UP mgs MGS MGS 5
> >                         1 UP mgc MGC132.66.176.211 at tcp
> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
> >                         2 UP mdt MDS MDS_uuid 3
> >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
> >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
> >                       [ root at x-math20 ~]#
> >                        
> *************************************************************end
> > mdt******************************8
> >                       so you can see that the MGS is up
> >                       ond on the ost's I get an error!! plz help...
> >
> >                       ost:
> >
> >  
> **********************************************************************
> >                       [ root at x-mathr11 ~]# mkfs.lustre --reformat  
> --fsname spfs --ost --
> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
> >
> >                          Permanent disk data:
> >                       Target:     spfs-OSTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x72
> >                                     (OST needs_index first_time  
> update )
> >                       Persistent mount opts: errors=remount- 
> ro,extents,mballoc
> >                       Parameters: mgsnode=132.66.176.211 at tcp
> >
> >                       device size = 19594MB
> >                       formatting backing filesystem ldiskfs on / 
> dev/hdb1
> >                               target name  spfs-OSTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 16384 - 
> I 256 -q -O
> > dir_index -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
> OSTffff  -J size=400 -i
> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
> >                       Writing CONFIGS/mountdata
> >                       [ root at x-mathr11 ~]# /CONFIGS/mountdata
> >                       -bash: /CONFIGS/mountdata: No such file or  
> directory
> >                       [root at x-mathr11 ~]# mount -t lustre /dev/ 
> hdb1 /mnt/test/ost1
> >                       mount.lustre: mount /dev/hdb1 at /mnt/test/ 
> ost1 failed: Input/
> > output error
> >                       Is the MGS running?
> >                        
> ***********************************************end
> > ost********************************
> >
> >                       can any one point out the problem?
> >                       thanks Avi.
> >
> >
> >
> >                        
> _______________________________________________
> >                       Lustre-discuss mailing list
> >                       Lustre-discuss at clusterfs.com
> >                       https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> >
> >
> >
> >
> >               Aaron Knister
> >               Associate Systems Administrator/Web Designer
> >               Center for Research on Environment and Water
> >
> >               (301) 595-7001
> >               aaron at iges.org
> >
> >
> >
> >
> >
> >
> > Aaron Knister
> > Associate Systems Administrator/Web Designer
> > Center for Research on Environment and Water
> >
> > (301) 595-7001
> > aaron at iges.org
> >
> >
> >
>
> Aaron Knister
> Associate Systems Administrator/Web Designer
> Center for Research on Environment and Water
>
> (301) 595-7001
> aaron at iges.org
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080102/636e7553/attachment.htm>


More information about the lustre-discuss mailing list