[Lustre-discuss] [ROMIO Req #940] [Fwd: Re: [Lustre-devel] a new Lustre ADIO driver]

emoly.liu Emoly.Liu at Sun.COM
Thu Mar 19 03:30:29 PDT 2009


Hi RobR,

Rob Ross wrote:
> Hi LiuYing,
>
> Unfortunately, the group here is committed to our interpretation of 
> the standard as being that the user passing a hint parameter that is 
> misleading to the implementation cannot cause *incorrect* behavior 
> (i.e. change the semantics of the call).
>
> An option for determining contiguity is to pass messages during 
> file_set_view time; if the file view is contiguous, then the access is 
> contiguous. Since file_set_view is a collective call, you have an 
> opportunity to do this message passing.
>
> I'm not sure how you're avoiding any communication, because the 
> application processes can still be performing I/O at arbitrary 
> offsets. Perhaps knowing that the access in file is contiguous, 
> however, can be used to reduce the overall communication at I/O time 
> anyway? Can you further explain how these hints worked? Maybe we can 
> come up with an alternative together.
We don't mean to avoid any communication. We just do some optimization 
only when the communication overhead has a big impact on the system 
performance.
When we tested the lustre adio driver, we found MPI_Alltoall cost much 
time. That is why we try to avoid it.

There are two MPI_Alltoall calls in the original codes. One is called by 
ADIOI_W_Exchange_data() and the other by ADIOI_Calc_others_req().

    * ADIOI_W_Exchange_data(): In this function, the original ADIO
      driver uses MPI_Alltoall() to exchange recv/send size among the
      processes. However, since Lustre ADIO driver reorganizes the
      requests into stripe-contiguous I/O pattern, recv/send size for
      each process can be calculated in the beginning of
      ADIOI_LUSTRE_Exch_and_write(). So we don't need MPI_Alltoall() to
      exchange offset and length any more.

    * ADIOI_Calc_others_req(): This is what we are discussing. In this
      function, the original ADIO driver uses MPI_Alltoall() to exchange
      the access information among the processes. It's necessary, but
      there is still a little space to improve. 
          o if the request data are contiguous, the access information
            (offset and length) can be calculated by other simpler
            communication type(i.e. MPI_Allreduce).
          o Further, if the request size is same, the access information
            can be calculated directly without any communication. 

As you said, if the hints settings are inconsistent with the data given 
by the user, it will cause incorrect behavior and break the semantics of 
the call. So I agree this problem must be fixed.

I don't remember exactly how much the hints help us, but I think only 
when the request data are contiguous and have same size, we can get the 
real benefit.
So, in other words, if checking contiguity+size will introduce new 
overhead, I prefer to remove the hints and use MPI_Alltoall cleanly.

Any idea ?

Thanks,
-LiuYing
 
>
> Thanks,
>
> Rob
>
> On Mar 19, 2009, at 1:27 AM, emoly.liu wrote:
>
>> Hi rob,
>>
>> Robert Latham wrote:
>>>
>>> On Mon, Mar 16, 2009 at 03:41:47PM +0800, emoly.liu wrote:
>>>>> romio_lustre_contig_data
>>>>> romio_lustre_samesize
>>>> They are two hints to tell the driver whether the request data are
>>>> contiguous and whether each request IO has the same size.  If they
>>>> are both "yes", we can optimize ADIOI_LUSTRE_Calc_others_req()  by
>>>> removing MPI_Alltoall(). Because each process can easily calculate
>>>> the pairs of offset and length for each request without collective
>>>> communication.  BTW, currently only when they are both positive, the
>>>> optimization can  work. In the future, probably some efforts will be
>>>> made to other  conditions.
>>>>
>>> OK, here's the one with the major problem.  RobR reminds me that
>>> MPI-IO requires hints to be optional and cannot cause incorrect
>>> behavior.  A user supplying these hints and then giving you data that
>>> is noncontiguous or not of the same size would cause incorrect
>>> behavior, so these aren't appropriate.
>>>
>>> Is there a way you can check what the caller is doing?  caller can lie
>>> to you via hints, but ROMIO still has to give the right answer.  RobR
>>> thought maybe MPI_Allreduce or something along those lines before the
>>> MPI_Alltoall would let you check.
>>>
>> Hmm, it is indeed a problem, although we did get benefits from them 
>> in our previous tests.
>>
>> I will check it. But currently, is it possible to make mention of the 
>> risk with some words, just like "Don't set these two hints, until you 
>> know exactly what you are doing" ?
>> If it is still inappropriate, I will remove them in this version, 
>> then submit another patch once I figure out how to check it with low 
>> overhead.
>
>


-- 
Best regards,

LiuYing
System Software Engineer, Lustre Group
Sun Microsystems ( China ) Co. Limited

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090319/71bd2509/attachment.htm>


More information about the lustre-discuss mailing list