[Lustre-discuss] Client Mount Error Messages
CHU, STEPHEN H (ATTSI)
sc1680 at att.com
Mon Dec 28 09:00:05 PST 2009
Hi all,
I have a question regarding a few error messages presented after a client has mounted the File System. The FS mounted ok and is useable but the LusterErrors do not look normal. The client does not have IB connectivity to the MDS/OSS but uses "tcpX" to access the MDSs/OSSs. MDSs and OSSs are inter-connected with IB. The following are the configurations:
MDS1:
· RHEL 5.3, Lustre 1.8.1.1, MGS and MDS on the same external storage drive; managed by heartbeat v1 for failover
· 10Ge tcp2(eth4) 10.103.30.201
· 10Ge tcp3(eth5) 10.103.30.101
· Infiniband o2ib0(ib0) 10.103.34.201
· Infiniband 02ib1(ib1) 10.103.34.101
· modprobe.conf: options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)
· MGS/MDS Parameters: lov.stripesize=25M lov.stripecount=1 failover.node=10.103.34.202 at o2ib,10.103.34.102 at o2ib1 mdt.group_upcall=/usr/sbin/l_getgroups
MDS2:
· RHEL 5.3, Lustre 1.8.1.1, pointing to the same external storage drive as MDS1; managed by heartbeat v1 failover
· 10Ge tcp2(eth4) 10.103.30.202
· 10Ge tcp3(eth5) 10.103.30.102
· Infiniband o2ib0(ib0) 10.103.34.202
· Infiniband 02ib1(ib1) 10.103.34.102
· options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)
OSS1:
· RHEL 5.3, Lustre 1.8.1.1, pointing to 16 OSTs on SAN storage; mount 8 odd number OSTs managed by heartbeat v1 for failover to OSS2
· 10Ge tcp4(eth4) 10.103.31.203
· 10Ge tcp5(eth5) 10.103.31.103
· Infiniband o2ib0(ib0) 10.103.34.203
· Infiniband 02ib1(ib1) 10.103.34.103
· options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)
· OSTs definition: Parameters: mgsnode=10.103.34.201 at o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at o2ib1
OSS2:
· RHEL 5.3, Lustre 1.8.1.1, pointing to same 16 OSTs on SAN storage as OSS1; mount 8 even number OSTs managed by heartbeat v1 for failover to OSS1
· 10Ge tcp4(eth4) 10.103.31.204
· 10Ge tcp5(eth5) 10.103.31.104
· Infiniband o2ib0(ib0) 10.103.34.204
· Infiniband 02ib1(ib1) 10.103.34.104
· options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)
· OSTs definition: Parameters: mgsnode=10.103.34.201 at o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at o2ib1
Client:
· RHEL4.5, Lustre 1.6.6 or RHEL5.3, Lustre 1.8.1.1
· Ge tcp4(eth2) 10.103.31.129 à OSS Channel
· Ge tcp2(eth3) 10.103.30.129 à MDS Channel
· options lnet networks=tcp2(eth3),tcp4(eth2)
The following messages are from RHEL4.5, Lustre 1.6.6:
Dec 15 22:09:32 bg8mo29sz kernel: LustreError: 29975:0:(events.c:465:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.202 at o2ib
Dec 15 22:09:32 bg8mo29sz kernel: LustreError: 29975:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.202 at o2ib!
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: spfs-clilov-000001020463f400.lov: set parameter stripesize=25M
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Skipped 1 previous similar message
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Client spfs-client has started
The following messages are from RHEL5.3, Lustre 1.8.1.1:
Lustre: MGC10.103.30.201 at tcp2: Reactivating import
LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.202 at o2ib
LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.202 at o2ib!
LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.204 at o2ib
LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.204 at o2ib!
Lustre: Client spfs-client has started
I couldn't figure out whether there is a configuration error in the MDS and OSS failover setup or if this is harmful warning. I would think that this setup should work and the client can be outside of the IB network. Any hint from anyone on this? As I mentioned earlier the FS works fine after mounting. Should I just ignore these error messages?
Thanks in advance.
Steve
Stephen Chu
AT&T Labs CSO
C5-3C03
200 Laurel Ave
Middletown, NJ
stephenchu at att.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091228/74663b60/attachment.htm>
More information about the lustre-discuss
mailing list