[Lustre-discuss] [ROMIO Req #940] [Fwd: Re: [Lustre-devel] a new Lustre ADIO driver]

emoly.liu Emoly.Liu at Sun.COM
Wed Mar 18 23:27:31 PDT 2009


Hi rob,

Robert Latham wrote:
> On Mon, Mar 16, 2009 at 03:41:47PM +0800, emoly.liu wrote:
>   
>
> Thanks for the documentation.  These explanations are good, but now
> we've found a few problems.  The naming issues are rather minor, but
> some of your hints aren't compliant with the MPI-IO spec,
> unfortunately.
>
>   
>>> romio_lustre_CO
>>>   
>>>       
>> In stripe-contiguous IO pattern, each OST will be accessed by a group of  
>> IO clients. CO means *C*lient/*O*ST ratio, the max. number of IO clients  
>> for each OST.
>> CO=1 by default.
>>     
>
> To make it more clear, how about calling it "romio_lustre_co_ratio" ?
>   
OK.
>   
>>> romio_lustre_bigsize
>>>   
>>>       
>> We won't do collective I/O if this hint is set and the IO request size  
>> is bigger than this value. That's because when the request size is big,  
>> the collective communication overhead increases and the benefits from  
>> collective I/O becomes limited.
>>     
>
> Instead of 'bigzise' how about "romio_lustre_coll_highwater" or
> "romio_lustre_coll_threshold"?
>   
Both of them make sense to me.
>   
>>> romio_lustre_contig_data
>>> romio_lustre_samesize
>>>   
>>>       
>> They are two hints to tell the driver whether the request data are
>> contiguous and whether each request IO has the same size.  If they
>> are both "yes", we can optimize ADIOI_LUSTRE_Calc_others_req()  by
>> removing MPI_Alltoall(). Because each process can easily calculate
>> the pairs of offset and length for each request without collective
>> communication.  BTW, currently only when they are both positive, the
>> optimization can  work. In the future, probably some efforts will be
>> made to other  conditions.
>>     
>
> OK, here's the one with the major problem.  RobR reminds me that
> MPI-IO requires hints to be optional and cannot cause incorrect
> behavior.  A user supplying these hints and then giving you data that
> is noncontiguous or not of the same size would cause incorrect
> behavior, so these aren't appropriate.
>
> Is there a way you can check what the caller is doing?  caller can lie
> to you via hints, but ROMIO still has to give the right answer.  RobR
> thought maybe MPI_Allreduce or something along those lines before the
> MPI_Alltoall would let you check.
>   
Hmm, it is indeed a problem, although we did get benefits from them in 
our previous tests.

I will check it. But currently, is it possible to make mention of the 
risk with some words, just like "Don't set these two hints, until you 
know exactly what you are doing" ?
If it is still inappropriate, I will remove them in this version, then 
submit another patch once I figure out how to check it with low overhead.

How about you ?
> Your other hints make a lot of intuitive sense to me.  Is this one a
> big win, though?  If MPI_Alltoall is giving you a big headache, then
> maybe there is a more fundamental problem with the MPI implementation?
>   
Thanks for your and RobR's careful review. Your comments are very 
helpful. Some problems still need more investigation.

B.R.
-LiuYing
> Thanks
> ==rob
>
>   


-- 
Best regards,

LiuYing
System Software Engineer, Lustre Group
Sun Microsystems ( China ) Co. Limited

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090319/7b6c78c5/attachment.htm>


More information about the lustre-discuss mailing list