There is a similar thread on this mailing list:<br><a href="http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/afe24159554cd3ff/8b37bababf848123?lnk=gst&q=I%2FO+error+on+clients#">http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/afe24159554cd3ff/8b37bababf848123?lnk=gst&q=I%2FO+error+on+clients#</a><br>
Also there is a bug open which reports similar problem:<br><a href="https://bugzilla.lustre.org/show_bug.cgi?id=23190">https://bugzilla.lustre.org/show_bug.cgi?id=23190</a><br><br><br><br><div class="gmail_quote">On 23 July 2010 10:02, Larry <span dir="ltr"><<a href="mailto:tsrjzq@gmail.com">tsrjzq@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">we have the same problem when running namd in lustre sometimes, the<br>
console log suggest file lock expired, but I don't know why.<br>
<div><div></div><div class="h5"><br>
On Fri, Jul 23, 2010 at 8:12 AM, Wojciech Turek <<a href="mailto:wjt27@cam.ac.uk">wjt27@cam.ac.uk</a>> wrote:<br>
> Hi Richard,<br>
><br>
> If the cause of the I/O errors is Lustre there will be some message in the<br>
> logs. I am seeing similar problem with some applications that run on our<br>
> cluster. The symptoms are always the same, just before application crashes<br>
> with I/O error node gets evicted with a message like that:<br>
> LustreError: 167-0: This client was evicted by ddn_data-OST000f; in<br>
> progress operations using this service will fail.<br>
><br>
> The OSS that mounts the OST from the above message has following line in the<br>
> log:<br>
> LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock<br>
> callback timer expired after 101s: evicting client at 10.143.5.9@tcp ns:<br>
> filter-ddn_data-OST000f_UUID lock: ffff81021a84ba00/0x744b1dd44<br>
> 81e38b2 lrc: 3/0,0 mode: PR/PR res: 34959884/0 rrc: 2 type: EXT<br>
> [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote:<br>
> 0x1d34b900a905375d expref: 9 pid: 1506 timeout 8374258376<br>
><br>
> Can you please check your logs for similar messages?<br>
><br>
> Best regards<br>
><br>
> Wojciech<br>
><br>
> On 22 July 2010 23:43, Andreas Dilger <<a href="mailto:andreas.dilger@oracle.com">andreas.dilger@oracle.com</a>> wrote:<br>
>><br>
>> On 2010-07-22, at 14:59, Richard Lefebvre wrote:<br>
>> > I have a problem with the Scalable molecular dynamics software NAMD. It<br>
>> > write restart files once in a while. But sometime the binary write<br>
>> > crashes. The when it crashes is not constant. The only constant thing is<br>
>> > it happens when it writes on our Lustre file system. When it write on<br>
>> > something else, it is fine. I can't seem find any errors in any of the<br>
>> > /var/log/messages. Anyone had any problems with NAMD?<br>
>><br>
>> Rarely has anyone complained about Lustre not providing error messages<br>
>> when there is a problem, so if there is nothing in /var/log/messages on<br>
>> either the client or the server then it is hard to know whether it is a<br>
>> Lustre problem or not...<br>
>><br>
>> If possible, you could try running the application under strace (limited<br>
>> to the IO calls, or it would be much too much data) to see which system call<br>
>> the error is coming from.<br>
>><br>
>> Cheers, Andreas<br>
>> --<br>
>> Andreas Dilger<br>
>> Lustre Technical Lead<br>
>> Oracle Corporation Canada Inc.<br>
>><br>
>> _______________________________________________<br>
>> Lustre-discuss mailing list<br>
>> <a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>
>> <a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Lustre-discuss mailing list<br>
> <a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>
> <a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>
><br>
><br>
</div></div></blockquote></div><br><br clear="all"><br>