[Lustre-discuss] [ROMIO Req #940] [Fwd: Re: [Lustre-devel] a new Lustre ADIO driver]

emoly.liu Emoly.Liu at Sun.COM
Mon Mar 16 00:41:47 PDT 2009


Robert Latham wrote:
> Hi LiuYing and everyone else. 
>
> Thanks for the improvements to this patch set.  I looked over them
> quickly and it looks OK to me.  I have committed this patch as
> revision 4055 in our trunk, and it will be part of the next MPICH2
> beta release (which is likely to happen in the next week or so).
>   
Thank you !
> I have two favors to ask of any other MPI-IO on Lustre users.  
>
> First favor: please try this out on as many parallel workloads and as
> many Lustre deployments as possible.   ROMIO has other file systems we
> can't test at ANL: we rely on contributors to maintain those drivers.
>
> You might be able to just replace the romio/ directory in your
> MPICH2-1.0.8 tarball with an SVN checkout 
>
> svn co https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/romio
>
> I know this patch was against MPICH2-1.0.7, but it applied cleanly
> against 1.0.8, due largely to how self-contained everything is -- most
> changed files were in the ad_lustre directory.
>
> These lustre patches (first from Weikuan, second from Sun) are
> sizeable hunks of code, and hard to review properly.  From here out,
> I hope that you will not be shy about sending further patches back
> "upstream" to us.   
>
> Second favor: documentation... Can you send me a brief summary of the
> new hints?  
>   
Sure
> romio_lustre_CO
>   
In stripe-contiguous IO pattern, each OST will be accessed by a group of 
IO clients. CO means *C*lient/*O*ST ratio, the max. number of IO clients 
for each OST.
CO=1 by default.
> romio_lustre_bigsize
>   
We won't do collective I/O if this hint is set and the IO request size 
is bigger than this value. That's because when the request size is big, 
the collective communication overhead increases and the benefits from 
collective I/O becomes limited.
> romio_lustre_ds_in_coll
>   
Collective IO will apply read-modify-write to deal with non-contiguous 
data by default. However, it will introduce some overhead(IO operation 
and locking). In our tests, data sieving showed bad collective write 
performance for some kinds of workloads.
So, to avoid this, we define ds_in_coll hint to disable RMW in 
collective I/O, distinguished from the one in independent I/O.
> romio_lustre_contig_data
> romio_lustre_samesize
>   
They are two hints to tell the driver whether the request data are 
contiguous and whether each request IO has the same size.
If they are both "yes", we can optimize ADIOI_LUSTRE_Calc_others_req() 
by removing MPI_Alltoall(). Because each process can easily calculate 
the pairs of offset and length for each request without collective 
communication.
BTW, currently only when they are both positive, the optimization can 
work. In the future, probably some efforts will be made to other 
conditions.
> romio_lustre_start_iodevice
>   
In Lustre, we use 3 values to describe striping information. They are
-stripe_size: Number of bytes on each OST
-stripe_count: Number of OSTs to stripe over (0 default, -1 all)
-start_ost: OST index of first stripe (-1 filesystem default)

In ROMIO, stripe_size(striping_factor) and stripe_count(striping_unit) 
have been defined in ADIOI_Hints_struct, but Lustre still needs another 
one for start_ost. So I use romio_lustre_start_iodevice for it. The 
similar name was ever used in Weikuan's patch.
> I know I for one am looking forward to improved MPI-IO performance
> on Lustre.  Thank you for the contribution.
>   
Thank you too.

-LiuYing
> ==rob
>
> On Mon, Mar 02, 2009 at 11:27:29AM +0800, emoly.liu wrote:
>   
>> Here is the new lustre adio driver patch. I fixed the following problem  
>> per our discussion:
>>
>>    * change the hints name
>>          o from xxx to romio_lustre_xxx
>>
>>    *  use the fd->hints structure instead of MPI Info routines
>>          o define a struct for lustre in ADIOI_Hints_struct in adioi.h
>>            and replace the old MPI_Info_get calls with the new romio hints
>>
>>    * check lustre/lustre_user.h header file in configure instead of
>>      giving the lustre structs/constants
>>          o define AC_CHECK_HEADERS in romio/configure.in. If the header
>>            file doesn't exist, report AC_MSG_ERROR
>>
>>    * restore mis-removed comments
>>
>>    * new MPE logging
>>          o add MPE logging for read/write in ad_lustre_rwconfig.c
>>
>>    * fix the issue in ad_lustre_open.c
>>
>>
>> I tested the new driver on less than 10 nodes with IOR benchmark.
>>
>> Please check and if you have any questions, please let me know.
>>
>>     
>
>   


-- 
Best regards,

LiuYing
System Software Engineer, Lustre Group
Sun Microsystems ( China ) Co. Limited

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090316/b8ba2f66/attachment.htm>


More information about the lustre-discuss mailing list