[Lustre-discuss] [Discuss] coverage measurement at 2012 09 15

Andreas Dilger adilger at whamcloud.com
Tue Oct 2 15:44:27 PDT 2012


On 2012-10-02, at 2:19 PM, Cory Spitz wrote:
>> Are the percentages if code coverage getting better or worse?
> 
> I don't know exactly, but based on the information that Robert Read
> shared at LUG '09, sanity was netting "60-70% coverage of core Lustre
> modules" (http://wiki.lustre.org/images/4/4f/RobertReadTalk1.pdf).

I was wondering that also, but according to the original URL from Roman, the mechanism for measuring code coverage was changed in the recent runs, so I don't know if it is possible to do head-to-head comparisons.

>> I can definitely imagine that many error handling code paths (e.g. checking for allocation failures) would not be exercised without specific changes (see e.g. my unlanded patch to fix the OBD_ALLOC() failure injection code).
> 
> Cray has started looking at testing w/forced memory allocation failures
> from the Linux fault injection framework
> (http://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt).

I've seen this, but hadn't actually had time to look into it.  I'm happy to see you taking the initiative to try out this new avenue for testing.

Another related (though different) set of tests would be to run on a client or server booted with a smaller amount of RAM (say 512MB-1GB) and see what problems appear.  I suspect there are a lot of hash tables, constants, etc. and such that do not properly scale with RAM size.

> As we make progress we'll open tickets and push patches.  I expect to
> find problems ;)

Yes, no doubt.  It is probably worthwhile to check the CEA Coverity patches before submitting anything new, in case those failures are already fixed there.

It is probably also worthwhile to submit a patch that removes the equivalent fault-injection code from the Lustre code paths, since it is pure runtime overhead for every memory allocation at this point.

> Andreas, were you talking about http://review.whamcloud.com/#change,3037?  If not, what ticket were you referring to?

Yes, that was it.  This patch has a few minor fixes that I found in my testing, and fixes the error messages, but there is no point in fixing the fault injection code anymore.

Cheers, Andreas

> On 09/29/2012 07:24 AM, Dilger, Andreas wrote:
>> Hi Roman,
>> The coverage data is interesting. It would be even more useful to be able to compare it to the previous code coverage run, if they used the same method for measuring coverage (the new report states that the method has changed and reduced coverage).
>> 
>> Are the percentages if code coverage getting better or worse?  Are there particular areas of the code that have poor coverage that could benefit from some focussed attention with new tests?
>> 
>> I can definitely imagine that many error handling code paths (e.g. checking for allocation failures) would not be exercised without specific changes (see e.g. my unlanded patch to fix the OBD_ALLOC() failure injection code). 
>> 
>> Running a test with periodic random allication failures enabled and fixing the resulting bugs would improve coverage, though not in a systematic way that could be measured/repeated. Still, this would find a class if hard-to-find bugs.
>> 
>> Similarly, running racer for extended periods is a good form of coverage generation, even if not systematic/repeatable. I think the racer code could be improved/extended by adding racet scripts that are Lustre-specific or exercise new functionality (e.g. "lfs setstripe", setfattr, getfattr, setfacl, getfacl). Running multiple racer instances on multiple clients/mounts and throwing recovery into the mix would definitely find new bugs.
>> 
>> In general, having the code coverage is a good starting point, but it isn't necessarily useful if nothing is done to improve the coverage of the tests as a result. 
>> 
>> Cheers, Andreas
>> 
>> On 2012-09-20, at 7:21, Roman Grigoryev <Roman_Grigoryev at xyratex.com> wrote:
>> 
>>> Hi,
>>> 
>>> next coverage measurement published,
>>> please see
>>> http://www.opensfs.org/foswiki/bin/view/Lustre/CodeCoverage20120915
>>> 
>>> Entrance page http://www.opensfs.org/foswiki/bin/view/Lustre/CodeCoverage
>>> 
>>> 
>>> Thanks,
>>>   Roman
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at lists.opensfs.org
>>> http://lists.opensfs.org/listinfo.cgi/discuss-opensfs.org
>> _______________________________________________
>> discuss mailing list
>> discuss at lists.opensfs.org
>> http://lists.opensfs.org/listinfo.cgi/discuss-opensfs.org
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/







More information about the lustre-discuss mailing list