<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
That is an interesting mix. Nothing shows up at all on the clients,
even on those 3 that route to a second NIC. On the OSS, it is quite
the mix of up/down on the 3 routers, with no obvious pattern.<br>
<br>
Most of our traffic is on the 10.10 network, with the 3 machines
shown below routing to a small number of clients on a more public
network.<br>
<br>
FYI, the current situation is one in which all machines are happy,
as far as I can tell.<br>
<br>
bob<br>
<br>
Running lctl show_route on all machines in lustre_fss.txt<br>
On umdist05.local<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
Succeeded<br>
On umfs06.local<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
net tcp2 hops 1 gw 10.10.1.52@tcp up<br>
Succeeded<br>
On umdist01.local<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
Succeeded<br>
On umdist02.local<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
Succeeded<br>
On umdist03.local<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.52@tcp up<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
Succeeded<br>
On umdist04.local<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.50@tcp up<br>
Succeeded<br>
On umdist07.local<br>
net tcp2 hops 1 gw 10.10.1.50@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
Succeeded<br>
On umdist08.local<br>
net tcp2 hops 1 gw 10.10.1.50@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.52@tcp
down<br>
net tcp2 hops 1 gw 10.10.1.51@tcp
down<br>
Succeeded<br>
<br>
<div class="moz-cite-prefix">On 9/5/2013 4:01 PM, Kris Howard wrote:<br>
</div>
<blockquote
cite="mid:CAFrN90EOihnvp3kGdx9FTf0iRv3tmNHpQHHr=+TrwyksJ_jMPg@mail.gmail.com"
type="cite">
<div dir="ltr">Might check lctl show_route and look for downed
routes.</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Thu, Sep 5, 2013 at 12:56 PM, Bob
Ball <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:ball@umich.edu" target="_blank">ball@umich.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">We are
running Lustre 2.1.6 on Scientific Linux 6.4, kernel
2.6.32-358.11.1.el6.x86_64. This was an upgrade from Lustre
1.8.4 on SL5.<br>
<br>
We have had a few situations lately where a client stops
talking to some subset of the OST (about 58 of these total
on 8 OSS, nearly 500TB in total). I have a couple of
questions.<br>
<br>
1. "lctl dl" on the OSS shows a smaller count on the
affected servers; on the client, all OSS showed UP in "lctl
dl". Today, I first tried rebooting this OSS, but that did
not change the situation. I ended up rebooting the client
before I could get full connectivity. Is there any way from
the client to get the reconnect, short of rebooting that
client?<br>
<br>
2. It used to be the case under Lustre 1.8.4 that I could
run "lfs df -h" on the client, and see all OST, even those
where the connection was not working, for whatever reason.
That is no longer the case, now the lfs command stops at
the first, non-talking OST. This seems more like a bug than
a feature. Is there some other way to see a list of
non-communicating OST on a client?<br>
<br>
Thanks in advance for any help offered.<br>
<br>
bob<br>
<br>
<br>
<br>
_______________________________________________<br>
HPDD-discuss mailing list<br>
<a moz-do-not-send="true"
href="mailto:HPDD-discuss@lists.01.org" target="_blank">HPDD-discuss@lists.01.org</a><br>
<a moz-do-not-send="true"
href="https://lists.01.org/mailman/listinfo/hpdd-discuss"
target="_blank">https://lists.01.org/mailman/listinfo/hpdd-discuss</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>