[Lustre-discuss] clients gets EINTR from time to time

Andreas Dilger adilger at whamcloud.com
Fri Feb 25 09:39:16 PST 2011


On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at whamcloud.com> wrote:
> On 11-02-25 06:18 AM, Francois  wrote:
>> 
>> I continue to parse debug logs and keep them posted.
> 
> I don't understand why you don't just fix your application to handle a
> perfectly valid and expected condition (that it's currently not
> handling) instead of wasting time trying to find the cause of the
> expected condition.  Even if you find it, it's likely not a bug and not
> something that can/will be fixed.  It's your application that needs to
> be fixed.

In all fairness Brian, it isn't always possible to fix an application like you suggest. It might be commercial (binary only), it might be complex code using 3rd party libraries to do the IO that would lose support if modifed, etc. 

I think the first action to debug this is to run on the client with "lctl set_param debug=+trace" or "=~0" which will enable function entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4 (which is -EINTR) to see where this error is first appearing. 

At that point we can make a determination where the source of the error is, and if it is Lustre's fault. I know at one time there was a related problem in the l_wait_event() macro that was improperly masking signals, but I thought it was fixed by 1.8.5. 

Cheers, Andreas


More information about the lustre-discuss mailing list