[Lustre-discuss] help needed.

Aaron Knister aaron at iges.org
Wed Jan 2 08:19:09 PST 2008


That all looks ok. From x-math20 could you run "lctl ping  
132.66.176.212 at tcp0"?

On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote:

> Hi, I get this:
> ***************************************************************************
> [root at x-math20 ~]# lctl list_nids
> 132.66.176.211 at tcp
> [root at x-math20 ~]# ifconfig -a
> eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF
> inet addr:132.66.176.211 Bcast:132.66.255.255 Mask:255.255.0.0
> inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0
> TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB)
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:8180 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:3335243 (3.1 MiB) TX bytes:3335243 (3.1 MiB)
>
> sit0 Link encap:IPv6-in-IPv4
> NOARP MTU:1480 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>
> [root at x-math20 ~]# cat /etc/modprobe.conf
> alias eth0 e100
> alias usb-controller uhci-hcd
> alias scsi_hostadapter ata_piix
> alias lustre llite
> options lnet networks=tcp0
> [root at x-math20 ~]#
>
> ***********************************************************************************************************8
>
> On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
> On the host x-math20 could you run an "lctl list_nids" and also an  
> "ifconfig -a". I want to see if lnet is listening on the correct  
> interface. Oh could you also post the contents of your /etc/ 
> modprobe.conf.
>
> Thanks!
>
> -Aaron
>
> On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:
>
>> Hello to every one and happy new year..
>> I think I have reduce my problem to this: lctl ping  
>> 132.66.176.211 at tcp0 don't work for me for some strange reason
>> as you can see:
>> ***********************************************************************************
>> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
>> failed to ping 132.66.176.211 at tcp: Input/output error
>> [root at x-math20 ~]# ping 132.66.176.211
>> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data.
>> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
>> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
>> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
>> --- 132.66.176.211 ping statistics ---
>> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
>> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
>> [root at x-math20 ~]#
>> *****************************************************************************************
>>
>>
>> On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote:
>> Hi,
>> here is the "iptables -L  " results:
>>
>>  NODE 1 132.66.176.212
>> Scientific Linux CERN SLC release 4.6 (Beryllium)
>> root at 132.66.176.212's password:
>> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il
>> [root at localhost ~]#
>> [root at localhost ~]#
>> [root at localhost ~]# iptables -L
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>> ************************************************************************************************
>>  MDT 132.66.176.211
>>
>> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
>> [root at x-math20 ~]# iptables -L
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>> *************************************************************************
>>
>> NODE 2 132.66.176.215
>> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
>> [root at x-mathr11 ~]# iptables -L
>>
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain RH-Firewall-1-INPUT (2 references)
>> target     prot opt source               destination
>> ACCEPT     all  --  anywhere             anywhere
>> ACCEPT     icmp --  anywhere             anywhere            icmp any
>> ACCEPT     ipv6-crypt--  anywhere             anywhere
>> ACCEPT     ipv6-auth--  anywhere             anywhere
>> ACCEPT     udp  --  anywhere             224.0.0.251         udp  
>> dpt:5353
>> ACCEPT     udp  --  anywhere             anywhere            udp  
>> dpt:ipp
>> ACCEPT     all  --  anywhere             anywhere            state  
>> RELATED,ESTAB
>> LISHED
>> ACCEPT     tcp  --  anywhere             anywhere            state  
>> NEW tcp dpts:
>> 30000:30101
>> ACCEPT     tcp  --  anywhere             anywhere            state  
>> NEW tcp dpt:s
>> sh
>> ACCEPT     udp  --  anywhere             anywhere            state  
>> NEW udp dpt:a
>> fs3-callback
>> REJECT     all  --  anywhere             anywhere            reject- 
>> with icmp-ho
>> st-prohibited
>> [root at x-mathr11 ~]#
>>
>> ************************************************************
>> one more thing....
>> Do you use TCP protocol? or do you use UDP?
>>
>> Regards Avi,
>> P.S I think a beginning of a beautiful friendship.. :-)
>>
>>
>>
>> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote:
>> That sounds like quite a task! Could you show me the contents of your
>> firewall rules on the systems mentioned below? (iptables -L) on each.
>> That would help to diagnose the problem further.
>>
>> -Aaron
>>
>> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>>
>> > Hi Aaron and thank you for you fast answwers.
>> > We are working (Avi,Meny and me) on the israeli GRID and we need to
>> > create a single huge file system for this GRID.
>> >     cheers
>> >          Yan
>> >
>> > ________________________________
>> >
>> > From: Aaron Knister [mailto: aaron at iges.org]
>> > Sent: Sun 12/23/2007 8:27 PM
>> > To: Avi Gershon
>> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe
>> > Subject: Re: [Lustre-discuss] help needed.
>> >
>> >
>> > Can you check the firewall on each of those machines ( iptables - 
>> L )
>> > and paste that here. Also, is this network dedicated to Lustre?
>> > Lustre can easily saturate a network interface under load to the
>> > point it becomes difficult to login to a node if it only has one
>> > interface. I'd recommend using a different interface if you can.
>> >
>> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
>> >
>> >
>> >       node 1 132.66.176.212 < http://132.66.176.212/>
>> >       node 2 132.66.176.215 < http://132.66.176.215/>
>> >
>> >       [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/ 
>>  >
>> >       root at 132.66.176.215's password:
>> >       ssh(21957) Permission denied, please try again.
>> >       root at 132.66.176.215 's password:
>> >       Last login: Sun Dec 23 14:32:51 2007 from x- 
>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>> > >
>> >       [root at x-mathr11 ~]#  lctl ping 132.66.176.211 at tcp0
>> >       failed to ping 132.66.176.211 at tcp: Input/output error
>> >       [root at x-mathr11 ~]#  lctl list_nids
>> >       132.66.176.215 at tcp
>> >       [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ 
>> >
>> >       The authenticity of host ' 132.66.176.212 <http://132.66.176.212/ 
>> >
>> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established.
>> >       RSA1 key fingerprint is  
>> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
>> > 7e:74.
>> >       Are you sure you want to continue connecting (yes/no)? yes
>> >       ssh(11526) Warning: Permanently added ' 132.66.176.212 < http://132.66.176.212/
>> > > ' (RSA1) to the list of kno
>> >       wn hosts.
>> >       root at 132.66.176.212's password:
>> >       Last login: Sun Dec 23 15:24:41 2007 from x- 
>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>> > >
>> >       [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
>> >       failed to ping 132.66.176.211 at tcp: Input/output error
>> >       [root at localhost ~]# lctl list_nids
>> >       132.66.176.212 at tcp
>> >       [root at localhost ~]#
>> >
>> >
>> >       thanks for helping!!
>> >       Avi
>> >
>> >
>> >       On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>  
>> wrote:
>> >
>> >
>> >               On the oss can you ping the mds/mgs using this  
>> command--
>> >
>> >               lctl ping 132.66.176.211 at tcp0
>> >
>> >               If it doesn't ping, list the nids on each node by  
>> running
>> >
>> >               lctl list_nids
>> >
>> >               and tell me what comes back.
>> >
>> >               -Aaron
>> >
>> >
>> >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
>> >
>> >
>> >                       HI I could use some help.
>> >                       I installed lustre on 3 computers
>> >                        mdt/mgs :
>> >
>> >
>> >  
>> ************************************************************************************8
>> >                       [root at x-math20 ~]#mkfs.lustre --reformat -- 
>> fsname spfs --mdt --
>> > mgs /dev/hdb
>> >
>> >                          Permanent disk data:
>> >                       Target:     spfs-MDTffff
>> >                       Index:      unassigned
>> >                       Lustre FS:  spfs
>> >                       Mount type: ldiskfs
>> >                       Flags:      0x75
>> >                                     (MDT MGS needs_index  
>> first_time update )
>> >                       Persistent mount opts: errors=remount- 
>> ro,iopen_nopriv,user_xattr
>> >                       Parameters:
>> >
>> >                       device size = 19092MB
>> >                       formatting backing filesystem ldiskfs on / 
>> dev/hdb
>> >                               target name  spfs-MDTffff
>> >                               4k blocks     0
>> >                               options        -J size=400 -i 4096 - 
>> I 512 -q -O dir_index
>> > -F
>> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
>> MDTffff  -J size=400 -i
>> > 4096 -I 512 -q -O dir_index -F /dev/hdb
>> >                       Writing CONFIGS/mountdata
>> >                       [ root at x-math20 ~]# df
>> >                       Filesystem           1K-blocks      Used  
>> Available Use% Mounted on
>> >                       /dev/hda1             19228276   4855244   
>> 13396284  27% /
>> >                       none                    127432         0     
>> 127432   0% /dev/shm
>> >                       /dev/hdb              17105436    455152   
>> 15672728   3% /mnt/test/
>> > mdt
>> >                       [root at x-math20 ~]# cat /proc/fs/lustre/ 
>> devices
>> >                         0 UP mgs MGS MGS 5
>> >                         1 UP mgc MGC132.66.176.211 at tcp
>> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
>> >                         2 UP mdt MDS MDS_uuid 3
>> >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
>> >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
>> >                       [ root at x-math20 ~]#
>> >                        
>> *************************************************************end
>> > mdt******************************8
>> >                       so you can see that the MGS is up
>> >                       ond on the ost's I get an error!! plz help...
>> >
>> >                       ost:
>> >
>> >  
>> **********************************************************************
>> >                       [ root at x-mathr11 ~]# mkfs.lustre --reformat  
>> --fsname spfs --ost --
>> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
>> >
>> >                          Permanent disk data:
>> >                       Target:     spfs-OSTffff
>> >                       Index:      unassigned
>> >                       Lustre FS:  spfs
>> >                       Mount type: ldiskfs
>> >                       Flags:      0x72
>> >                                     (OST needs_index first_time  
>> update )
>> >                       Persistent mount opts: errors=remount- 
>> ro,extents,mballoc
>> >                       Parameters: mgsnode=132.66.176.211 at tcp
>> >
>> >                       device size = 19594MB
>> >                       formatting backing filesystem ldiskfs on / 
>> dev/hdb1
>> >                               target name  spfs-OSTffff
>> >                               4k blocks     0
>> >                               options        -J size=400 -i 16384  
>> -I 256 -q -O
>> > dir_index -F
>> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- 
>> OSTffff  -J size=400 -i
>> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
>> >                       Writing CONFIGS/mountdata
>> >                       [ root at x-mathr11 ~]# /CONFIGS/mountdata
>> >                       -bash: /CONFIGS/mountdata: No such file or  
>> directory
>> >                       [root at x-mathr11 ~]# mount -t lustre /dev/ 
>> hdb1 /mnt/test/ost1
>> >                       mount.lustre: mount /dev/hdb1 at /mnt/test/ 
>> ost1 failed: Input/
>> > output error
>> >                       Is the MGS running?
>> >                        
>> ***********************************************end
>> > ost********************************
>> >
>> >                       can any one point out the problem?
>> >                       thanks Avi.
>> >
>> >
>> >
>> >                        
>> _______________________________________________
>> >                       Lustre-discuss mailing list
>> >                       Lustre-discuss at clusterfs.com
>> >                       https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>> >
>> >
>> >
>> >
>> >
>> >               Aaron Knister
>> >               Associate Systems Administrator/Web Designer
>> >               Center for Research on Environment and Water
>> >
>> >               (301) 595-7001
>> >               aaron at iges.org
>> >
>> >
>> >
>> >
>> >
>> >
>> > Aaron Knister
>> > Associate Systems Administrator/Web Designer
>> > Center for Research on Environment and Water
>> >
>> > (301) 595-7001
>> > aaron at iges.org
>> >
>> >
>> >
>>
>> Aaron Knister
>> Associate Systems Administrator/Web Designer
>> Center for Research on Environment and Water
>>
>> (301) 595-7001
>> aaron at iges.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>
>

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080102/82e67c87/attachment.htm>


More information about the lustre-discuss mailing list