[Lustre-discuss] [ROMIO Req #940] [Fwd: Re: [Lustre-devel] a new Lustre ADIO driver]

emoly.liu Emoly.Liu at Sun.COM
Sun Mar 1 19:27:29 PST 2009


Hi Robert,

Here is the new lustre adio driver patch. I fixed the following problem 
per our discussion:

    * change the hints name
          o from xxx to romio_lustre_xxx

    *  use the fd->hints structure instead of MPI Info routines
          o define a struct for lustre in ADIOI_Hints_struct in adioi.h
            and replace the old MPI_Info_get calls with the new romio hints

    * check lustre/lustre_user.h header file in configure instead of
      giving the lustre structs/constants
          o define AC_CHECK_HEADERS in romio/configure.in. If the header
            file doesn't exist, report AC_MSG_ERROR

    * restore mis-removed comments

    * new MPE logging
          o add MPE logging for read/write in ad_lustre_rwconfig.c

    * fix the issue in ad_lustre_open.c


I tested the new driver on less than 10 nodes with IOR benchmark.

Please check and if you have any questions, please let me know.


Thanks,
LiuYing

Robert Latham wrote:
> Hi.  Just wanted to see how things are going with this patch.  I think
> the MPICH2 folks are looking to start the release process in the next
> few weeks, and I'd like to get this patchset into that distribution. 
>
> Thanks
> ==rob
>
> On Tue, Jan 13, 2009 at 01:52:59PM -0600, Robert Latham wrote:
>   
>> Hi LiuYing. Thanks for giving my comments such consideration.  I've
>> only responded to a few points below inline:
>>
>> On Wed, Jan 14, 2009 at 12:16:34AM +0800, emoly.liu wrote:
>>     
>>>>  ad_lustre/ad_lustre.h
>>>> - bring in lustre structs/constants if "lustre_user.h" not getting
>>>>   brought in: is this really necessary?  ROMIO won't try to build
>>>>   ad_lustre unless 'lustre' is part of "--with-file-systems" at
>>>>   configure time
>>>>         
>>> --with-file-systems=lustre only tells it will build lustre adio, but
>>> it doesn't show where is the lustre_user.h, which is needed by
>>> ad_open.c (for setting stripe stuff).  So if there is no
>>> lustre_user.h in /user/include (no lustre-devel rpm in your system),
>>> hmm, maybe we should only give error msg at this moment, instead of
>>> adding this if.
>>>       
>> This is sort of what we do for PVFS and PVFS2: if the user explicitly
>> asks for pvfs or pvfs2, and ROMIO configure.in cannot find the
>> necessary header files, we AC_ERROR.   I'd like to see the same
>> behavior for lustre too.  
>>
>>     
>>>>   ad_lustre/ad_lustre_aggregate.c - Lots of changes.  I see you started 
>>>> from adio/common/ad_aggregate.c
>>>>
>>>> - ADIOI_LUSTRE_Docollect: interesting policy to avoid collective I/O
>>>>   if any piece bigger than a big_req hint.
>>>> - You removed a lot of comments. why?
>>>>   
>>>>         
>>> Some comments are not suitable to lustre adio driver and some were  
>>> probably removed during the codes cleanup. I will restore the latter.
>>>       
>> I can certainly appreciate the need to remove misleading comments.
>> All the ROMIO maintainers will tell you, though, that this code is
>> tricky and we look at it infrequently enough that it's really pretty
>> hard to over-comment it.
>>
>>     
>>>> - here again, you should refer to ROMIO hint struct instead of calling
>>>>   MPI_Info functions
>>>>   
>>>>         
>>> Sorry, I am not sure what ROMIO hint structure you mean. I assume it is  
>>> ADIOI_Hints_struct. I can add lustre hints to that struct, but they are  
>>> lustre related, not common. Is that ok?
>>> If I misunderstand, please advise.
>>>       
>> Sure.  I'll be happy to explain further.
>>
>> You're on the right track. In adioi.h we have ADIOI_Hints_struct.
>> This struct contains many fs-independent values (cb_buffer_size,
>> striping_unit), but it also contains a union for fs-specific values.
>> Right now, that union only has values for pvfs and pvfs2, but we
>> should add another member for lustre. 
>>
>> Here's an example of ROMIO using the hints in this structure
>> (adio/ad_pvfs2/ad_pvfs2_read.c)
>>
>> if (fd->hints->fs_hints.pvfs2.listio_read == ADIOI_HINT_ENABLE) {
>> 	ret = ADIOI_PVFS2_ReadStridedListIO(fd, buf, count, datatype,
>> 		file_ptr_type, offset, status, error_code);
>> 	return;
>> }
>>
>> No need to extract the "romio_pvfs2_listio" keyval from the MPI_INFO
>> member, no need to strcasecmp the info value.  Those steps were all
>> done at MPI_FILE_OPEN or MPI_FILE_SET_VIEW time.  It's a small
>> optimization, and what you do now is correct, but making use of
>> ADIOI_Hints_struct will make your code more like the other drivers.
>>
>>     
>>>> - maybe we need, as wei-keng suggests, a better separation so you
>>>>   don't need to duplicate Calc_my_req just so you can call the
>>>>   lustre-sepecific Calc_aggregator.   Could a hint do this for you?
>>>>   
>>>>         
>>> yes, that would be better if there is a separation.
>>>       
>> OK, glad we're on the same page.  I don't think you need to do this
>> bit of software engineering for your patch.  It's more a note to
>> ourselves: if we ever redesign ROMIO, we'll have to be sure to make
>> this part of it.
>>
>>     
>>>> - missing the "new MPE" logging anthony put into rest of ROMIO
>>>>   
>>>>         
>>> I had a look at MPE events defined in adioi.h. Could you tell me what  
>>> else MPE logs you hope to see?
>>>       
>> Take a look at some of the other files that have 
>> #ifdef ADIOI_MPE_LOGGING
>> lines. 
>> (Unfortunately, there are still some legacy MPE lines in the code,
>> too.  I'll have to delete them)
>>
>> The events in adioi.h are exactly what I'd like to see. I see some
>> lustre files already have some of them.
>>
>> Here's an example of a modern call (do it like this):
>> MPE_Log_event( ADIOI_MPE_open_a, 0, NULL );
>>
>> and here's an ancient call (don't do this):
>> MPE_Log_event(9, 0, "start close");
>>
>> In the end, though, this is for developers to see what's going on
>> inside ROMIO with JUMPSHOT.  it's not critical, though you might find
>> it helpful when you are looking for answers to performance questions.
>>
>>     
>>>> - missing wei-keng's reworked type processing   
>>>>         
>>> Sorry, I checked the codes but I didn't find anything missed
>>> commented  by wei-keng.  Could you tell me what type processing is
>>> missed in this file?
>>>       
>> This one is a bit hard to describe, because the initial work went into
>> revision 958, but then we did some testing and refining for several
>> more revisions before settling on a final version.   It's ok if you
>> don't incorporate this change, but if you're motivated, svn revision
>> 1001 makes some changes to ad_read_coll which are demonstrative of the
>> overall changes also needed in ad_write_coll, ad_write_str, and
>> ad_read_str.  
>>
>> In ad_lustre, you have one file for "io", so you have already reduced
>> your work by half.  Good.
>>
>>     
>>> If I misunderstand or miss anything, please let me know.
>>> Thank you so much!
>>>       
>> Thank you for the good discussion.  I'm looking forward to seeing the
>> next revision.
>>
>> ==rob
>>
>>     
>
>   


-- 
Best regards,

LiuYing
System Software Engineer, Lustre Group
Sun Microsystems ( China ) Co. Limited

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090302/40e83d0d/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: adio_driver_mpich2-1.0.7_v6.patch
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090302/40e83d0d/attachment.txt>


More information about the lustre-discuss mailing list