<div>
Sorry, I have to correct this: "the nodes CANNOT mount the storage and I can't access the Lustre server machine neither".</div>
<div><div><br></div></div>
<p style="color: #A0A0A8;">On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
<span><div><div>
<div>
Hi everyone,
</div><div>I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue.</div><div>An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical?</div><div><br></div><div>Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again!</div><div><br></div><div>Thanks,</div>
</div></div></span>
</blockquote>
<div>
<br>
</div>