[Lustre-discuss] Help

Nihir Parikh nihirp at supermicro.com
Tue Nov 16 17:17:50 PST 2010


Hello There,
I am trying to wet my feet by diving into the waters of Lustre File System and I am having some problems that I need help with. I have 3 physical servers and this is what I have installed on all 3 servers. Let's name them S1, S2, S3 for now.

S1
I have dual port IB card and here is network config for each port
ib0 - 192.168.100.100
ib1 - 172.16.100.100

Kernel-2.6.18-194.3.1.el5_lustre
Lustre-modules
Lustre-ldiskfs
Lustre-1.8.4-
E2fsprogrs

Here is the /etc/modprobe.conf file
options lnet forwarding="enabled"
options lnet accept=all
options lnet networks="o2ib0(ib0),o2ib1(ib1)"

I have partitioned /dev/sda3 and /dev/sda4 on this server as mgs/mdt and ost filesystem respectively.

S2
I have one port IB card and here is the network config for that port. I have connected this port directly to ib0 of S1 server.

ib0 - 192.168.100.101

Here is the /etc/modprobe.conf file

options lnet networks="o2ib0(ib0)"
options lnet routes="o2ib1 192.168.100.100 at o2ib0<mailto:192.168.100.100 at o2ib0>"

When I run cat /proc/sys/lnet/routers I get following output
ref         rtr_ref    alive_cnt            state     last_ping           ping_sent          deadline            down_ni            router
3          1          4                      up         4303108            1                      NA                    -2                     192.168.100.100 at o2ib<mailto:192.168.100.100 at o2ib>

When I run lctl ping 192.168.100.100 at o2ib0<mailto:192.168.100.100 at o2ib0> , I get following output
12345-0 at lo
12345-192.168.100.100 at o2ib<mailto:12345-192.168.100.100 at o2ib>
12345-172.16.100.100 at o2ib1<mailto:12345-172.16.100.100 at o2ib1>

S3
I have one port IB card and here is the network config for that port. I have connected this port directly to ib1 of S1 server

ib0 - 172.16.100.101

Here is the /etc/modprobe.conf file

options lnet networks="o2ib1(ib0)"
options lnet routes="o2ib0 172.16.100.100 at o2ib1"

When I run cat /proc/sys/lnet/routers I get following output
ref         rtr_ref    alive_cnt            state     last_ping           ping_sent          deadline            down_ni            router
3          1          2                      up         4297593            1                      NA                    -2                     172.16.100.100 at o2ib<mailto:192.168.100.100 at o2ib>1

When I run lctl ping 172.16.100.100 at o2ib1 , I get following output
12345-0 at lo
12345-192.168.100.100 at o2ib<mailto:12345-192.168.100.100 at o2ib>
12345-172.16.100.100 at o2ib1<mailto:12345-172.16.100.100 at o2ib1>


Now my problem is to run some network tests from S2 --> S3 and S3 --> S2 to measure the bandwidth but somehow both S2 and S3 complain that network is unreachable. What am I doing wrong?

Thanks
Nihir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101116/331f0527/attachment.htm>


More information about the lustre-discuss mailing list