[Lustre-discuss] Client Mount Error Messages

CHU, STEPHEN H (ATTSI) sc1680 at att.com
Mon Dec 28 09:00:05 PST 2009


Hi all,

 

I have a question regarding a few error messages presented after a client has mounted the File System. The FS mounted ok and is useable but the LusterErrors do not look normal. The client does not have IB connectivity to the MDS/OSS but uses "tcpX" to access the MDSs/OSSs. MDSs and OSSs are inter-connected with IB. The following are the configurations:

 

MDS1:

·        RHEL 5.3, Lustre 1.8.1.1, MGS and MDS on the same external storage drive; managed by heartbeat v1 for failover

·        10Ge tcp2(eth4) 10.103.30.201

·        10Ge tcp3(eth5) 10.103.30.101

·        Infiniband o2ib0(ib0) 10.103.34.201

·        Infiniband 02ib1(ib1) 10.103.34.101

·        modprobe.conf: options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)

·        MGS/MDS Parameters: lov.stripesize=25M lov.stripecount=1 failover.node=10.103.34.202 at o2ib,10.103.34.102 at o2ib1 mdt.group_upcall=/usr/sbin/l_getgroups

 

MDS2:

·        RHEL 5.3, Lustre 1.8.1.1, pointing to the same external storage drive as MDS1; managed by heartbeat v1 failover

·        10Ge tcp2(eth4) 10.103.30.202

·        10Ge tcp3(eth5) 10.103.30.102

·        Infiniband o2ib0(ib0) 10.103.34.202

·        Infiniband 02ib1(ib1) 10.103.34.102

·        options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)

 

OSS1:

·        RHEL 5.3, Lustre 1.8.1.1, pointing to 16 OSTs on SAN storage; mount 8 odd number OSTs managed by heartbeat v1 for failover to OSS2

·        10Ge tcp4(eth4) 10.103.31.203

·        10Ge tcp5(eth5) 10.103.31.103

·        Infiniband o2ib0(ib0) 10.103.34.203

·        Infiniband 02ib1(ib1) 10.103.34.103

·        options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)

·        OSTs definition: Parameters: mgsnode=10.103.34.201 at o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at o2ib1

 

OSS2:

·        RHEL 5.3, Lustre 1.8.1.1, pointing to same 16 OSTs on SAN storage as OSS1; mount 8 even number OSTs managed by heartbeat v1 for failover to OSS1

·        10Ge tcp4(eth4) 10.103.31.204

·        10Ge tcp5(eth5) 10.103.31.104

·        Infiniband o2ib0(ib0) 10.103.34.204

·        Infiniband 02ib1(ib1) 10.103.34.104

·        options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)

·        OSTs definition: Parameters: mgsnode=10.103.34.201 at o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at o2ib1

 

Client:

·        RHEL4.5, Lustre 1.6.6 or RHEL5.3, Lustre 1.8.1.1

·        Ge tcp4(eth2) 10.103.31.129 à OSS Channel

·        Ge tcp2(eth3) 10.103.30.129 à MDS Channel

·        options lnet networks=tcp2(eth3),tcp4(eth2)

 

The following messages are from RHEL4.5, Lustre 1.6.6:

 

Dec 15 22:09:32 bg8mo29sz kernel: LustreError: 29975:0:(events.c:465:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.202 at o2ib

Dec 15 22:09:32 bg8mo29sz kernel: LustreError: 29975:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.202 at o2ib!

Dec 15 22:09:32 bg8mo29sz kernel: Lustre: spfs-clilov-000001020463f400.lov: set parameter stripesize=25M

Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Skipped 1 previous similar message

Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Client spfs-client has started

 

The following messages are from RHEL5.3, Lustre 1.8.1.1:

 

Lustre: MGC10.103.30.201 at tcp2: Reactivating import

LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.202 at o2ib

LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.202 at o2ib!

LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.204 at o2ib

LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.204 at o2ib!

Lustre: Client spfs-client has started

 

I couldn't figure out whether there is a configuration error in the MDS and OSS failover setup or if this is harmful warning. I would think that this setup should work and the client can be outside of the IB network. Any hint from anyone on this? As I mentioned earlier the FS works fine after mounting. Should I just ignore these error messages?

 

Thanks in advance.

 

Steve

 

Stephen Chu

AT&T Labs CSO

C5-3C03

200 Laurel Ave

Middletown, NJ

stephenchu at att.com

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091228/74663b60/attachment.htm>


More information about the lustre-discuss mailing list