<div dir="ltr">Hi Michael,<div><br></div><div>Are you observing errors on the clients and or servers? If so, what errors? Can you provide them to the list to take a look?</div><div><br></div><div>-cf</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 13, 2015 at 9:13 AM, Michael Di Domenico <span dir="ltr"><<a href="mailto:mdidomenico4@gmail.com" target="_blank">mdidomenico4@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I have a small isolated cluster (rhel 6.6) and lustre filesystem<br>

(v2.4.3), all are running over ipoib.  Currently I have flock turned<br>

on across all nodes.  I'm seeing an issue where the work load i have<br>

running sometimes outputs zero length files instead of data.<br>

re-running the job corrects the data, so i'm pretty sure it's not code<br>

related.<br>

<br>

my question is, is there some kind of timeout and error from flock<br>

that lustre will kick back to my code that i could detect?  and if so,<br>

is there a way of changing the timeout delay?  are there any other<br>

counters somewhere in lustre that would show me if i'm having a large<br>

number of flock timeouts?<br>

_______________________________________________<br>

lustre-discuss mailing list<br>

<a href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>

</blockquote></div><br></div>