[Lustre-discuss] Metadata storage in test script files

Wed May 2 08:53:28 PDT 2012

Hi Andreas,

On 05/02/2012 08:14 AM, Andreas Dilger wrote:
> On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:

>>>> On 04/30/2012 08:50 PM, Chris wrote:
>>>>> Prerequisites:    Pre-requisite tests that must be run before this test can be run. This is again an array which presumes a test may
>>>>> have multiple pre-requisites, but the data should not contain a
>>>>> chain of prerequisites, i.e. if A requires B and B requires C, the
>>>>> pre-requisites of A is B not B & C.
>>>> On which step do you want to check chains? And what is logical base
>>>> for this prerequisites exclude case that current tests have hidden
>>>> dependencies?
>>>>  I don't see any difference between one test which have body from tests a,b,c and this prerequisites definition.
>>>> Could you please explain more why we need this field?
>>> As I said we can mine this data any-time and anyway that we want, and
>>> the purpose of this discussion is the data not how we use it. But as
>>> an example something that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite information
>>> but it is firstly inflexible, for example I bet 'b','c' and 'd' are
>>> often dependent on 'a' but not each other, secondly and more
>>> importantly we want a standard form for storing metadata because we
>>> want to introduce order and knowledge into the test
>>> scripts that we have today.
>>
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
> 
> I think it is sufficient to have a well-structured repository of test
> metadata, and then multiple uses can be found for this data.  Even for
> human use, a good description of what the test is supposed to check,
> and why this test exists would be a good start.

I absolute agree that good description, summary and other fields are
very important.
> 
> The test metadata format is extensible, so should we need more fields
> in the future it will be possible to add them.  I think the hardest
> work will be to get good text descriptions of the tests, not mechanical
> issues like dependencies and such.

I think this work will be pretty long and I suggest to ask it only for
new and changed tests. In this case, possibility to have some kind of
description inheritance is good solution.

> 
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don't like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
>>
>>>
>>>>> TicketIDs:             This is an array of ticket numbers that this test
>>>>> explicitly tests. In theory we should aim for the state where
>>>>> every ticket has a test associated with it, and in future we
>>>>> should be able to carry out a gap analysis.
>>>>>
>>>> I suggest add keywords(Components could be translated as keywords too) and test type (stress, benchmark, load, functional, negative,
>>>> etc) for quick filtering. For example, SLOW could transform to
>>>> keyword.
>>> This seems like a reasonable idea although we need a name that describes what it is, we will need to define that set of possible
>>> words as we need to with the Components elements.
>>
>> I mean that 'keywords' should be separated from components but could be
>> logically included. I think, 'Components' is special type of keywords.
>>
>>> What should this field be called - we should not reduce the value of
>>> this data why genericizing it into 'keywords'.
>>>
>>>> Also,  I would like to mention, we have 3 different logical types of
>>>> data:
>>>> 1) just human-readable descriptions
>>>> 2) filtering and targeting fields (Componens, keywords if you agree with
>>>> my suggestion)
>>>> 3) framework directives(Prerequisites)
>>>>
>>>>> As time goes on we may well expand this compulsory list, but this is I
>>>>> believe a sensible starting place.
>>>>>
>>>>> Being part of the source this data will be subject to the same review
>>>>> process as any other change and so we cannot store dynamic data here,
>>>>> such as pass rates etc.
>>>> What you you think, maybe it is good idea to keep metadata separately?
>>>> This can be useful for simplifying changing data via script for mass
>>>> modification also as adding tickets and pass rate and execution time on
>>>> 'gold' configurations?
>>> It would be easier to store the data separately and we could use Maloo
>>> but it's very important that this data becomes part of the Lustre
>>> 'source' so that everybody can benefit from it. Adding tickets is
>>> not a problem as part of the resolution issue is to ensure that at
>>> least one test exercises the problem and proves it has been fixed,
>>> the fact that this assurance process requires active
>>> interaction by an engineer with the scripts is a positive.
>>>
>>> As for pass rate, execution time and gold configurations this
>>> information is just not 1 dimensional enough to store in the source.
>>
>> I'm not accidentally in previous letter said about group of fields. All
>> meta data may be separated by rare and often changed fields. F.e.
>> Summary will change not so often. But test timeout in golden
>> configuration (I mean that this timeout will be set as default based on
>> 'gold' configuration and can be overloaded in specific configuration)
>> could be more variable(and possible more important for testing).
> 
> I think this is something that needs to live outside the test metadata
> being described here.  The definition of "golden configuration" is
> hard to define, and depends heavily on factors that change from one
> environment to the next.

We could separate dynamic and static metadata. But it will be good if
both set of data use one engine and storage type with just different
sources.

> 
> Ideally, tests will be written so that they can run under a wide range
> of configurations (number of clients, servers, virtual and real nodes).
> A further goal might be to allow many non-destructive functional subtests
> to be run in parallel, which would further skew the time taken, but
> would allow much more efficient use of test resources.

It will be very good if we have big enough set of fully independent tests.

> 
>> Using separated files provides more flexibility and nobody stop us to
>> commit it to lustre repo and it became " Lustre 'source'". In separated
>> files we can use format which we want and all information will be
>> available without parsing shell script or without running it. More over,
>> in great future, it give us very simple migration from shell to other
>> language.
> 
> I think the metadata format should be chosen so that it is trivial to
> extract the test metadata without having to execute or parse the shell
> (or other) test language itself.  Simple filtering and regexp should
> be enough.
> 

Why do you want do 'filtering and regexp' with some error probability
for selecting data and also do special code injection to shell script
when we can avoid it? It is good chance to start work for run away from
shell there. If question is in developers comfort  I prefer to suggest
tools for checking metadata completeness then have code and metadata in
one file.
Also,  I don't see good way to use 'metadata inheritance' way in shell
without adding pretty unclear shell code, so switch to metadata usage
should be one-monent or test framework just ignore it and metadata
became just static text for external scripts.

-- 
Thanks,
	Roman