[Lustre-discuss] clients gets EINTR from time to time

Ken Hornstein kenh at cmf.nrl.navy.mil
Thu Feb 24 08:21:18 PST 2011


>As for your questions :
>- I have to mention that I always had had this issue, and this is why
>I've upgraded from 1.8.4 to 1.8.5, hoping this would solve it.

Ah, okay, I misunderstood that; my apologies.

>- I will try to have that SA_RESTART flag set in the app... if I can
>find where the signal handler is set.

Searching for sigaction or signal should help there.

>- How can I see that lustre is returning EINTR for any other reason ?
>As I said no logs shows nothing neither on MDS or OSSs, but I didn't go
>through examining "lctl debug_kernel" yet... which I'm going to do
>right away...

Weeelll ... that was just a guess on my part.  I did a quick grep
though the Lustre sources and saw a few places where EINTR was
returned, but most of those seemed to deal with the case where I/O was
interrupted (those places happened fairly far down in the stack; it
wasn't clear to me that those errors would ever bubble back up to a
return code to a system call).  If _that_ is the issue, then tracking
that down will be a challenge.

>my last question is : how can I tell which signal I am receiving ?
>because my app doesn't say, it just dumps outs the write/pwrite error
>code.

I think your easiest way is to use strace; something like "strace -e signal"
should do the right thing (that will only trace signals, not all system calls).

>And if there is no signal handler, then it should follow the "standard"
>actions (as of man 7 signal). On the other hand, my app does not stop
>or dump core, and is not ignored, so it has to be handled in the code.
>Correct me if I'm wrong...

That is my understanding as well; if you don't have a signal handler
installed, the default action should be taking place, and if the
default action is to ignore the signal that you shouldn't be getting
EINTR.  But hey, I've been wrong before :-)

--Ken



More information about the lustre-discuss mailing list