[Lustre-discuss] Problems adding new OSS to existing Lustre filesystem -- Refusing connection, No matching NI

Michael D. Seymour seymour at cita.utoronto.ca
Fri Apr 24 08:53:35 PDT 2009


Hi,

We are having a problem adding a new OSS (roc06, 10.5.203.6) to an existing 
Lustre file system (raid-cita) on the 10.5 network. selinux and iptables are 
disabled. It is a multi-homed OSS on the 10.4 and 10.5 network.

When mounted, clients are trying to connect to the Lustre file system via the 
10.4 network, even though things are set up to use the 10.5 network. The clients 
do not see the new space on the file system either. It shows 23T as opposed to 
the > 27T it should show.

lfs quota hangs as well.

We did suffer some problems with the MDS filesystem, which was fcsked, the 
kernel downgraded to 1.6.6 and remounted.

Many messages like this exist in /var/log/messages on the new OSS:

Apr 24 10:01:07 roc06 kernel: LustreError: 120-3: Refusing connection from 
10.4.1.52 for 10.4.203.6 at tcp: No matching NI

On the multi-homed client 10.4.1.52:

[root at tpb52-chroot ~]# uname -a; cat /etc/redhat-release
Linux tpb52 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9 19:56:55 MST 
2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5 (Final)

[root at tpb52-chroot ~]# df -h /mnt/raid-cita/
Filesystem            Size  Used Avail Use% Mounted on
10.5.203.250 at tcp:/roc
                        23T   11T   12T  47% /mnt/raid-cita

[root at tpb52-chroot ~]# lctl list_nids
10.5.2.12 at tcp

[root at tpb52-chroot ~]# grep lnet /etc/modprobe.conf
options lnet networks=tcp0(eth1)

[root at tpb52-chroot ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:15:C5:EC:FA:8C
           inet addr:10.5.2.12  Bcast:10.5.255.255  Mask:255.255.0.0

On the OSS roc06:

[root at roc06 lustre]# uname -a; cat /etc/redhat-release
Linux roc06 2.6.18-92.1.17.el5_lustre.1.6.7.1smp #1 SMP Mon Apr 13 16:13:00 MDT 
2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.3 (Final)

[root at roc06 lustre]# lctl list_nids
10.5.203.6 at tcp

[root at roc06 ~]# grep lnet /etc/modprobe.conf
options lnet networks=tcp0(eth1)

[root at roc06 ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:22:19:05:90:F2
           inet addr:10.5.203.6  Bcast:10.5.255.255  Mask:255.255.0.0

The OSS was formatted with the following:

mkfs.lustre --verbose --reformat --fsname=roc --ost --mgsnode=10.5.203.250 at tcp0 
--mkfsoptions="-m 0 -E  stride=32" /dev/md2

I believe this was done before "options lnet networks=tcp0(eth1)" was included 
in modprobe.conf.

[root at roc06 ~]# tunefs.lustre --print /dev/md2

    Permanent disk data:
Target:     roc-OST0005
Index:      5
Lustre FS:  roc
Mount type: ldiskfs
Flags:      0x402
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.203.250 at tcp ost.quota_type=u


For comparison, the OSS roc05:

[root at roc05 ~]# uname -a; cat /etc/redhat-release
Linux roc05 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9 19:56:55 MST 
2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5 (Final)

[root at roc05 ~]# lctl list_nids
10.5.203.5 at tcp

[root at roc05 ~]# grep lnet /etc/modprobe.conf
options lnet networks=tcp0(eth1)

[root at roc05 ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:1C:23:D5:F5:4F
           inet addr:10.5.203.5  Bcast:10.5.255.255  Mask:255.255.0.0

[root at roc05 ~]# tunefs.lustre --print /dev/md2

    Permanent disk data:
Target:     roc-OST0004
Index:      4
Lustre FS:  roc
Mount type: ldiskfs
Flags:      0x402
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.5.203.250 at tcp ost.quota_type=u


On the MDS (rocpile):

[root at rocpile ~]#  uname -a; cat /etc/redhat-release
Linux rocpile 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 
2008 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5 (Final)

[root at rocpile ~]# lctl list_nids
10.5.203.250 at tcp

[root at rocpile ~]# grep lnet /etc/modprobe.conf
options lnet networks=tcp(eth1)

[root at rocpile ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:15:C5:EC:F6:88
           inet addr:10.5.203.250  Bcast:10.5.255.255  Mask:255.255.0.0


Any suggestions?

Thanks,
Mike

-- 
Michael D. Seymour                 Phone: 416-978-1776
Scientific Computing Support       Fax: 416-978-3921
Canadian Institute for Theoretical Astrophysics, University of Toronto



More information about the lustre-discuss mailing list