[Lustre-discuss] Metadata storage in test script files

Chris chris at whamcloud.com
Wed May 2 02:25:59 PDT 2012

On 02/05/2012 04:23, Roman Grigoryev wrote:
> Hi Cris,
> On 05/01/2012 08:17 PM, Chris wrote:
>> The metadata can be used in a multitude of ways, for example we can
>> create dynamic test sets based on
>> the changes made or target area of testing. What we are doing here is
>> creating an understanding of the
>> tests that we have so that we can improve our processes and testing
>> capabilities in the future.
> I think that when are are defining tool we should say about purpose.
> F.e. good description  and summary is not needed for creating dynamic
> test sets. I think, it very important to say how will we use it.
> Continue of this idea please read below.
The purpose is to enable use to develop and store knowledge/information 
about the tests, the information should be in a conical form, objective 
and correct. If we do this then the whole community can make use of it 
as they see fit. I want to ensure that the initial set of stored 
variables describes the tests as completely as reasonably possible. The 
conical description of each test is not effected by the usage to which 
the data is put.

>> The metadata does not go to the results. The metadata is a database in
>> it's own right and should metadata
>> about a test be required it would be accessed from the source (database)
>> itself.
> I think fields like title, summary, and, possible. description should be
> present in results too. It can be very helpful for quickly understanding
> test results.
They can be presented as part of results but I would not store with the 
results, if for example Maloo presents the description it will fetch it 
from the correct version of the source, we should not be making copies 
of data.

I cannot suppose if you should store this information with your results 
because I have no insight into your private testing practices.
>>> On 04/30/2012 08:50 PM, Chris wrote:
>> ... snip ...
>> As I said we can mine this data any-time and anyway that we want, and
>> the purpose of this
>> discussion is the data not how we use it. But as an example something
>> that dynamically built
>> test sets would need to know prerequisites.
>> The suffix of a,b,c could be used to generate prerequisite information
>> but it is firstly inflexible, for example
>> I bet 'b','c' and 'd' are often dependent on 'a' but not each other,
>> secondly and more importantly we want a
>> standard form for storing metadata because we want to introduce order
>> and knowledge into the test
>> scripts that we have today.
> Why I asked about way of usage: if we want to use this information in
> scripts and in other automated way we must strictly specify logic on
> items and provides tool for check it.
> F.e. we will use it when built test execution queue. We have chain like
> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
> prerequisite. In one good day test A became excluded. Is it possible to
> execute test C?
> But if we will not use it in scripting there is no big logical problem.
> (My opinion: I don't like this situation and think that test
> dependencies should be used only in very specific and rare case.)
I don't think people should introduce dependencies either, but they have 
and we have to deal with that fact. In your example if C is dependent on 
A and A is removed then C cannot be run.
>>> I suggest add keywords(Components could be translated as keywords too)
>>> and test type (stress, benchmark, load, functional, negative, etc) for
>>> quick filtering. For example, SLOW could transform to keyword.
>> This seems like a reasonable idea although we need a name that describes
>> what it is,
>> we will need to define that set of possible words as we need to with the
>> Components elements.
> I mean that 'keywords' should be separated from components but could be
> logically included. I think, 'Components' is special type of keywords.
I don't think of Components as a keyword, I think of it as a factual 
piece of data and if we want to add the test purpose then we should call 
it that. The use of keywords in data is generally a typeless catch-all. 
All of this metadata should be clear and well defined which does not in 
my opinion allow scope for a keywords element.

I would suggest that we add a variable called Purposes which is an array 
containing a set of predefined elements like stress, benchmark, load and 
functional etc.

For example

   - stress
   - load

>> It would be easier to store the data separately and we could use Maloo
>> but it's very important
>> that this data becomes part of the Lustre 'source' so that everybody can
>> benefit from it. Adding
>> tickets is not a problem as part of the resolution issue is to ensure
>> that at least one test exercises
>> the problem and proves it has been fixed, the fact that this assurance
>> process requires active
>> interaction by an engineer with the scripts is a positive.
>> As for pass rate, execution time and gold configurations this
>> information is just not 1 dimensional
>> enough to store in the source.
> I'm not accidentally in previous letter said about group of fields. All
> meta data may be separated by rare and often changed fields. F.e.
> Summary will change not so often. But test timeout in golden
> configuration (I mean that this timeout will be set as default based on
> 'gold' configuration and can be overloaded in specific configuration)
> could be more variable(and possible more important for testing).
What exactly is a gold configuration? Lustre has such breadth of 
possibilities that gold configurations would be a matrix of 
distro/architecture/distro version/interconnect/cpu 
speed/memory/storage/oss count/client count/... . To try and summarise 
this into some useful single value does not make any sense to me.
>   Using separated files provides more flexibility and nobody stop us to
> commit it to lustre repo and it became " Lustre 'source'". In separated
> files we can use format which we want and all information will be
> available without parsing shell script or without running it. More over,
> in great future, it give us very simple migration from shell to other
> language.
This data is valuable and needs to be treated with the same respect and 
discipline as we treat the source, to imagine we can have a 'free for 
all' where people just update it at will does not work. The controls on 
what goes into the Lustre tree are there for very good reason and we are 
not going to circumvent those controls. We have to invest in this as we 
do with all the test infrastructure, it cannot be done on the cheap.

Parsing the scripts for the data is easy because computers are really 
good at it. I would expect someone will write a library to access and 
modify the data as required, I'd also expect them to publish that library.

If test's were re-written then this data will probably change, and the 
cost of migrating unchanged data will be insignificant compared to the 
cost of re-writing the test itself.


More information about the lustre-discuss mailing list