[Lustre-discuss] clients gets EINTR from time to time

Fri Mar 4 02:04:31 PST 2011

Dear list,
still investigating on this issue, I am now struggling with debugging..
The issue arose once more yesterday, so I started to look at it deeper and decided that the "trace" debug should be written to disk using debug_daemon.
Alas, debugging with only the "trace" debug active spits more than 100 MB/s worth of log ! (yes these are busy clients)...
I've tried several strategies like using debug_kernel from a cron job, or while watching my products error log, but even there dk would dump 70MB worth of data representing less that one second of debug log...
So chances for me to trace the signal seems looow.
Is there any debug flag less verbose but that may include the signal I'm looking for ?

Given John's answers could I maybe use /proc/sys/lustre/dump_on_timeout to dump the log only when timeout happens, but this will work only if my problem is matching what John can reproduce.

Please also note that I've looked around for abnormal threads_started numbers, it is everywhere at the same value than threads_min, except for one mdt entry which is at thread_min+1... 

Regards

weborama	line	François Chassaing Directeur Technique - CTO 

----- Mail Original -----
De: "John Hammond" <jhammond at tacc.utexas.edu>
À: "Andreas Dilger" <adilger at whamcloud.com>
Cc: lustre-discuss at lists.lustre.org
Envoyé: Vendredi 25 Février 2011 21h16:36 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 02/25/2011 11:39 AM, Andreas Dilger wrote:
> On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at whamcloud.com> wrote:
>> On 11-02-25 06:18 AM, Francois  wrote:
>>>
>>> I continue to parse debug logs and keep them posted.
>>
>> I don't understand why you don't just fix your application to handle a
>> perfectly valid and expected condition (that it's currently not
>> handling) instead of wasting time trying to find the cause of the
>> expected condition.  Even if you find it, it's likely not a bug and not
>> something that can/will be fixed.  It's your application that needs to
>> be fixed.
> 
> In all fairness Brian, it isn't always possible to fix an application like you suggest. It might be commercial (binary only), it might be complex code using 3rd party libraries to do the IO that would lose support if modifed, etc. 
> 
> I think the first action to debug this is to run on the client with "lctl set_param debug=+trace" or "=~0" which will enable function entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4 (which is -EINTR) to see where this error is first appearing. 
> 
> At that point we can make a determination where the source of the error is, and if it is Lustre's fault. I know at one time there was a related problem in the l_wait_event() macro that was improperly masking signals, but I thought it was fixed by 1.8.5. 

Setting aside the moral question of which calls should be interruptible,
I think that the handling of the LUSTRE_FATAL_SIGS (defined in
lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is
slightly broken.  Under certain situations, Lustre will return -EINTR
although no signals were delivered.  That's probably not the end of the
world for most applications, but OTOH I don't think anybody assumes that
-EINTR will be delivered spuriously.

Consider the following sequence:

1) Process P has a Lustre file F open.

2) P has SIGALRM pending (but blocked).

3) P starts to writing to F and ends up sleeping in (something like):

  sys_write()
   ...
    ll_extent_lock()
     ...
      osc_enqueue()
       ...
        ptlrpc_queue_wait().

4) The OST does not respond to the request before the deadline, so
l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS,
notices that SIGALRM is now deliverable, restores the signal mask of P,
and ptlrpc_queue_wait() returns -EINTR.

5) P is exiting from sys_write(), SIGALRM is blocked (but still pending)
so it doesn't get delivered.

6) P spuriously returns -EINTR from sys_write().

I can reproduce this on 1.8.5/RHEL 5.5.  If the goal is to emulate NFS's
interruptibility during congestion then returning -ERESTARTSYS would be
more appropriate.  Also, it might be worthwhile to make this extra
interruptibility a mount flag, as NFS does.

Best,

John

-- 
John L. Hammond, Ph.D.
TACC, The University of Texas at Austin
jhammond at tacc.utexas.edu
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss