[Lustre-discuss] clients gets EINTR from time to time

DEGREMONT Aurelien aurelien.degremont at cea.fr
Thu Feb 24 08:57:07 PST 2011


Hello

 From my understanding, Lustre can return EINTR for some I/O error cases.
I think that when a client gets evicted in the middle of one of its RPC, 
it can returns EINTR to the caller.
Is this can explain your issue?

Can your verify your clients where not evicted at the same time?

Aurélien

Francois Chassaing a écrit :
> OK, thanks it makes it more clear.
> I indeed messed up my mind (and words) between signals and error return codes.
> I did understood that the write()/pwrite() system was returning the EINTR error code because it received a signal, but I supposed that the signal was sent because of an error condition somewhere in the FS. 
> This is where I now think I'm wrong. 
>  
> As for your questions :
> - I have to mention that I always had had this issue, and this is why I've upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
> - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set.
> - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn't go through examining "lctl debug_kernel" yet... which I'm going to do right away...
>
> my last question is : how can I tell which signal I am receiving ? because my app doesn't say, it just dumps outs the write/pwrite error code. 
> And if there is no signal handler, then it should follow the "standard" actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I'm wrong...
>
> At that point, you realize that I didn't write the app, nor am I a good Linux guru ;-)
>
> Tnaks a lot.
>
> weborama	line	François Chassaing Directeur Technique - CTO 
>
> ----- Mail Original -----
> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil>
> À: "Francois Chassaing" <fch at weborama.com>
> Cc: lustre-discuss at lists.lustre.org
> Envoyé: Jeudi 24 Février 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time
>
>   
>> OK, the app is used to deal with standard disks, that is why it is not
>> handling the EINTR signal propoerly.
>>     
>
> I think you're misunderstanding what a "signal" is in the Unix sense.
>
> EINTR isn't a signal; it's a return code from the write() system call
> that says, "Hey, you got a signal in the middle of this write() call
> and it didn't complete".  It doesn't mean that there was an error
> writing the file; if that was happening, you'd get a (presumably
> different) error code.  Signals can be sent by the operating system,
> but those signals are things like SIGSEGV, which basically means, "you're
> program screwed up".  Programs can also send signals to each other,
> with kill(2) and the like.
>
> Now, NORMALLY systems calls like write() are interrupted by signals
> when you're writing to "slow" devices, like network sockets.  According
> to the signal(7) man page, disks are not normally considered slow
> devices, so I can understand the application not being used to handling
> this.  And you know, now that I think about it I'm not even sure that
> network filesystems SHOULD allow I/O system calls to be interrupted by
> signals ... I'd have to think more about it.
>
> I suspect what happened is that something changed between 1.8.5 and the
> previous version of Lustre that you were using that allowed some operations
> to be interruptable by signals.  Some things to try:
>
> - Check to see if you are, in fact, receiving a signal in your application
>   and Lustre isn't returning EINTR for some other reason.
> - If you are receiving a signal, when you set the signal handler for it
>   you could use the SA_RESTART flag to restart the interrupted I/O; I think
>   that would make everything work like it did before.
>
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list