[Lustre-discuss] help needed.
Aaron Knister
aaron at iges.org
Wed Jan 2 12:20:35 PST 2008
Can you run dmesg and send me any lustre related errors? Also what's
the output of "getenforce"?
-Aaron
On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote:
> no, that don't work also :-( ..
> thanks for answering so fast
> Avi
>
> On 1/2/08, Aaron Knister <aaron at iges.org > wrote:
> That all looks ok. From x-math20 could you run "lctl ping
> 132.66.176.212 at tcp0"?
>
> On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote:
>
>> Hi, I get this:
>> ***************************************************************************
>> [root at x-math20 ~]# lctl list_nids
>> 132.66.176.211 at tcp
>> [root at x-math20 ~]# ifconfig -a
>> eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF
>> inet addr:132.66.176.211 Bcast:132.66.255.255 Mask: 255.255.0.0
>> inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB)
>>
>> lo Link encap:Local Loopback
>> inet addr: 127.0.0.1 Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:8180 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB)
>>
>> sit0 Link encap:IPv6-in-IPv4
>> NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>> [root at x-math20 ~]# cat /etc/modprobe.conf
>> alias eth0 e100
>> alias usb-controller uhci-hcd
>> alias scsi_hostadapter ata_piix
>> alias lustre llite
>> options lnet networks=tcp0
>> [root at x-math20 ~]#
>>
>> ***********************************************************************************************************8
>>
>> On 1/2/08, Aaron Knister <aaron at iges.org> wrote:
>> On the host x-math20 could you run an "lctl list_nids" and also an
>> "ifconfig -a". I want to see if lnet is listening on the correct
>> interface. Oh could you also post the contents of your /etc/
>> modprobe.conf.
>>
>> Thanks!
>>
>> -Aaron
>>
>> On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:
>>
>>> Hello to every one and happy new year..
>>> I think I have reduce my problem to this: lctl ping
>>> 132.66.176.211 at tcp0 don't work for me for some strange reason
>>> as you can see:
>>> ***********************************************************************************
>>> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
>>> failed to ping 132.66.176.211 at tcp: Input/output error
>>> [root at x-math20 ~]# ping 132.66.176.211
>>> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data.
>>> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
>>> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
>>> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
>>> --- 132.66.176.211 ping statistics ---
>>> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
>>> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
>>> [root at x-math20 ~]#
>>> *****************************************************************************************
>>>
>>>
>>> On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote:
>>> Hi,
>>> here is the "iptables -L " results:
>>>
>>> NODE 1 132.66.176.212
>>> Scientific Linux CERN SLC release 4.6 (Beryllium)
>>> root at 132.66.176.212's password:
>>> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il
>>> [root at localhost ~]#
>>> [root at localhost ~]#
>>> [root at localhost ~]# iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source destination
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source destination
>>> ************************************************************************************************
>>> MDT 132.66.176.211
>>>
>>> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
>>> [root at x-math20 ~]# iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source destination
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source destination
>>> *************************************************************************
>>>
>>> NODE 2 132.66.176.215
>>> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
>>> [root at x-mathr11 ~]# iptables -L
>>>
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source destination
>>> RH-Firewall-1-INPUT all -- anywhere anywhere
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source destination
>>> RH-Firewall-1-INPUT all -- anywhere anywhere
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source destination
>>>
>>> Chain RH-Firewall-1-INPUT (2 references)
>>> target prot opt source destination
>>> ACCEPT all -- anywhere anywhere
>>> ACCEPT icmp -- anywhere anywhere icmp
>>> any
>>> ACCEPT ipv6-crypt-- anywhere anywhere
>>> ACCEPT ipv6-auth-- anywhere anywhere
>>> ACCEPT udp -- anywhere 224.0.0.251 udp
>>> dpt:5353
>>> ACCEPT udp -- anywhere anywhere udp
>>> dpt:ipp
>>> ACCEPT all -- anywhere anywhere state
>>> RELATED,ESTAB
>>> LISHED
>>> ACCEPT tcp -- anywhere anywhere state
>>> NEW tcp dpts:
>>> 30000:30101
>>> ACCEPT tcp -- anywhere anywhere state
>>> NEW tcp dpt:s
>>> sh
>>> ACCEPT udp -- anywhere anywhere state
>>> NEW udp dpt:a
>>> fs3-callback
>>> REJECT all -- anywhere anywhere
>>> reject-with icmp-ho
>>> st-prohibited
>>> [root at x-mathr11 ~]#
>>>
>>> ************************************************************
>>> one more thing....
>>> Do you use TCP protocol? or do you use UDP?
>>>
>>> Regards Avi,
>>> P.S I think a beginning of a beautiful friendship.. :-)
>>>
>>>
>>>
>>> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote:
>>> That sounds like quite a task! Could you show me the contents of
>>> your
>>> firewall rules on the systems mentioned below? (iptables -L) on
>>> each.
>>> That would help to diagnose the problem further.
>>>
>>> -Aaron
>>>
>>> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>>>
>>> > Hi Aaron and thank you for you fast answwers.
>>> > We are working (Avi,Meny and me) on the israeli GRID and we need
>>> to
>>> > create a single huge file system for this GRID.
>>> > cheers
>>> > Yan
>>> >
>>> > ________________________________
>>> >
>>> > From: Aaron Knister [mailto: aaron at iges.org]
>>> > Sent: Sun 12/23/2007 8:27 PM
>>> > To: Avi Gershon
>>> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe
>>> > Subject: Re: [Lustre-discuss] help needed.
>>> >
>>> >
>>> > Can you check the firewall on each of those machines ( iptables -
>>> L )
>>> > and paste that here. Also, is this network dedicated to Lustre?
>>> > Lustre can easily saturate a network interface under load to the
>>> > point it becomes difficult to login to a node if it only has one
>>> > interface. I'd recommend using a different interface if you can.
>>> >
>>> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
>>> >
>>> >
>>> > node 1 132.66.176.212 < http://132.66.176.212/>
>>> > node 2 132.66.176.215 < http://132.66.176.215/>
>>> >
>>> > [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/
>>> >
>>> > root at 132.66.176.215's password:
>>> > ssh(21957) Permission denied, please try again.
>>> > root at 132.66.176.215 's password:
>>> > Last login: Sun Dec 23 14:32:51 2007 from x-
>>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>>> > >
>>> > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0
>>> > failed to ping 132.66.176.211 at tcp: Input/output error
>>> > [root at x-mathr11 ~]# lctl list_nids
>>> > 132.66.176.215 at tcp
>>> > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/
>>> >
>>> > The authenticity of host ' 132.66.176.212 <http://132.66.176.212/
>>> >
>>> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established.
>>> > RSA1 key fingerprint is
>>> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
>>> > 7e:74.
>>> > Are you sure you want to continue connecting (yes/no)? yes
>>> > ssh(11526) Warning: Permanently added ' 132.66.176.212 < http://132.66.176.212/
>>> > > ' (RSA1) to the list of kno
>>> > wn hosts.
>>> > root at 132.66.176.212's password:
>>> > Last login: Sun Dec 23 15:24:41 2007 from x-
>>> math20.tau.ac.il <http://x-math20.tau.ac.il/
>>> > >
>>> > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
>>> > failed to ping 132.66.176.211 at tcp: Input/output error
>>> > [root at localhost ~]# lctl list_nids
>>> > 132.66.176.212 at tcp
>>> > [root at localhost ~]#
>>> >
>>> >
>>> > thanks for helping!!
>>> > Avi
>>> >
>>> >
>>> > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>
>>> wrote:
>>> >
>>> >
>>> > On the oss can you ping the mds/mgs using this
>>> command--
>>> >
>>> > lctl ping 132.66.176.211 at tcp0
>>> >
>>> > If it doesn't ping, list the nids on each node by
>>> running
>>> >
>>> > lctl list_nids
>>> >
>>> > and tell me what comes back.
>>> >
>>> > -Aaron
>>> >
>>> >
>>> > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
>>> >
>>> >
>>> > HI I could use some help.
>>> > I installed lustre on 3 computers
>>> > mdt/mgs :
>>> >
>>> >
>>> >
>>> ************************************************************************************8
>>> > [root at x-math20 ~]#mkfs.lustre --reformat --
>>> fsname spfs --mdt --
>>> > mgs /dev/hdb
>>> >
>>> > Permanent disk data:
>>> > Target: spfs-MDTffff
>>> > Index: unassigned
>>> > Lustre FS: spfs
>>> > Mount type: ldiskfs
>>> > Flags: 0x75
>>> > (MDT MGS needs_index
>>> first_time update )
>>> > Persistent mount opts: errors=remount-
>>> ro,iopen_nopriv,user_xattr
>>> > Parameters:
>>> >
>>> > device size = 19092MB
>>> > formatting backing filesystem ldiskfs on /
>>> dev/hdb
>>> > target name spfs-MDTffff
>>> > 4k blocks 0
>>> > options -J size=400 -i 4096
>>> -I 512 -q -O dir_index
>>> > -F
>>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
>>> MDTffff -J size=400 -i
>>> > 4096 -I 512 -q -O dir_index -F /dev/hdb
>>> > Writing CONFIGS/mountdata
>>> > [ root at x-math20 ~]# df
>>> > Filesystem 1K-blocks Used
>>> Available Use% Mounted on
>>> > /dev/hda1 19228276 4855244
>>> 13396284 27% /
>>> > none 127432
>>> 0 127432 0% /dev/shm
>>> > /dev/hdb 17105436 455152
>>> 15672728 3% /mnt/test/
>>> > mdt
>>> > [root at x-math20 ~]# cat /proc/fs/lustre/
>>> devices
>>> > 0 UP mgs MGS MGS 5
>>> > 1 UP mgc MGC132.66.176.211 at tcp
>>> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
>>> > 2 UP mdt MDS MDS_uuid 3
>>> > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
>>> > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
>>> > [ root at x-math20 ~]#
>>> >
>>> *************************************************************end
>>> > mdt******************************8
>>> > so you can see that the MGS is up
>>> > ond on the ost's I get an error!! plz
>>> help...
>>> >
>>> > ost:
>>> >
>>> >
>>> **********************************************************************
>>> > [ root at x-mathr11 ~]# mkfs.lustre --
>>> reformat --fsname spfs --ost --
>>> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
>>> >
>>> > Permanent disk data:
>>> > Target: spfs-OSTffff
>>> > Index: unassigned
>>> > Lustre FS: spfs
>>> > Mount type: ldiskfs
>>> > Flags: 0x72
>>> > (OST needs_index first_time
>>> update )
>>> > Persistent mount opts: errors=remount-
>>> ro,extents,mballoc
>>> > Parameters: mgsnode=132.66.176.211 at tcp
>>> >
>>> > device size = 19594MB
>>> > formatting backing filesystem ldiskfs on /
>>> dev/hdb1
>>> > target name spfs-OSTffff
>>> > 4k blocks 0
>>> > options -J size=400 -i
>>> 16384 -I 256 -q -O
>>> > dir_index -F
>>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
>>> OSTffff -J size=400 -i
>>> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
>>> > Writing CONFIGS/mountdata
>>> > [ root at x-mathr11 ~]# /CONFIGS/mountdata
>>> > -bash: /CONFIGS/mountdata: No such file or
>>> directory
>>> > [root at x-mathr11 ~]# mount -t lustre /dev/
>>> hdb1 /mnt/test/ost1
>>> > mount.lustre: mount /dev/hdb1 at /mnt/test/
>>> ost1 failed: Input/
>>> > output error
>>> > Is the MGS running?
>>> >
>>> ***********************************************end
>>> > ost********************************
>>> >
>>> > can any one point out the problem?
>>> > thanks Avi.
>>> >
>>> >
>>> >
>>> >
>>> _______________________________________________
>>> > Lustre-discuss mailing list
>>> > Lustre-discuss at clusterfs.com
>>> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Aaron Knister
>>> > Associate Systems Administrator/Web Designer
>>> > Center for Research on Environment and Water
>>> >
>>> > (301) 595-7001
>>> > aaron at iges.org
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Aaron Knister
>>> > Associate Systems Administrator/Web Designer
>>> > Center for Research on Environment and Water
>>> >
>>> > (301) 595-7001
>>> > aaron at iges.org
>>> >
>>> >
>>> >
>>>
>>> Aaron Knister
>>> Associate Systems Administrator/Web Designer
>>> Center for Research on Environment and Water
>>>
>>> (301) 595-7001
>>> aaron at iges.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>
>> Aaron Knister
>> Associate Systems Analyst
>> Center for Ocean-Land-Atmosphere Studies
>>
>> (301) 595-7000
>> aaron at iges.org
>>
>>
>>
>>
>>
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>
>
Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies
(301) 595-7000
aaron at iges.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080102/a2882ea2/attachment.htm>
More information about the lustre-discuss
mailing list