[Lustre-discuss] mpi-io support

Marty Barnaby mlbarna at sandia.gov
Fri May 9 08:31:10 PDT 2008


Phil,

If you are having the same problems I've had, I would offer to try the 
advise that some have given below. I am working with several layers of 
which I am not the owner, but I have the source and can make edits. For 
me, it is reasonable to call my own, explicit MPI_info_set during 
initialization, for the hints, romio_ds_write and romio_ds_read changing 
both their respective values to 'disable'. How these defaults are 
initialized in the ROMIO code in adio/common/ad_hints.c (for these two, 
specifically, 'enable') is the only best documentation I have found on 
matter. I've never seen anything describing all the hints available, and 
the syntax and semantics for the acceptable values.

I don't fully understand data sieving, but I believe it is an older 
paradigm, and not applicable to our current, high-performance, 
large-distribution, parallel FS. My suggestion was that, at least here, 
with Lustre, and it's new abstract device routines, the _ds_ be set to 
disable, so I don't have to find a place in every new library I deal 
with to set it explicitly myself.


Marty



Phil Dickens wrote:
> hello,
>
>   I am having similar struggles with locking on MPI-IO.
> I am doing a simple strided write, and it fails because
> of the locking. I'm a bit behind in the discussion, but
> is there a way to fix (workaround) this problem?? Is this
> something in my code, or the default driver (this is on
> lonestar at TACC)? I have even downloaded the most up to date
> version of MPICH, which I believe has a new Lustre ADIO
> driver, but I am running into the same issues.
>
>   Any thoughts would be greatly appreciated!!
>
> Phil
>
>
> On Thu, 8 May 2008, Tom.Wang wrote:
>
>   
>> Hi
>>
>> Marty Barnaby wrote:
>>     
>>> To return to this discussion, in recent testing, I have found that
>>> writing to a Lustre FS via a higher level library, like PNetCDF, fails
>>> because the default for value for romio_ds_write is not disable. This
>>> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c
>>>       
>> You can use MPI_Info_set to disable romio_ds_write.  What is the fail?
>> flock? since data-sieving need flock.
>>     
>>> I believe it has something to do with locking issues. I'm not sure how
>>> best to handle this, I'd prefer the data sieving default be disable,
>>> though I don't know all the implications there.
>>>       
>> I agree data sieving should be disable. And also it check the contiguous
>> buftype or filetype only by fileview, which is not enough sometimes, and
>> trigger unnecessary read-modify-write even for contiguous
>> write(especially for those higher level library, if you choose
>> collective write). Since lustre has client cache and also the overhead
>> of flock and read-modify-write, so I doubt the performance improvements
>> we could  get from data-sieving on lustre, although I do not have
>> performance data to prove that.
>>     
>>> Maybe an ad_lustre_open should be a place where the  _ds_  hints are
>>> set to disable.
>>>       
>> Yes, we should disable this for stride write in lustre. ad_lustre_open
>> seems a right place to do this.
>>
>> Thanks
>> WangDi
>>     
>>> Marty Barnaby
>>>
>>>
>>> Weikuan Yu wrote:
>>>       
>>>> Andreas Dilger wrote:
>>>>
>>>>         
>>>>> On Mar 11, 2008  16:10 -0600, Marty Barnaby wrote:
>>>>>
>>>>>           
>>>>>> I'm not actually sure what ROMIO abstract device the multiple CFS
>>>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS.
>>>>>> Did you have a recommended option yourself.
>>>>>>
>>>>>>             
>>>>> The UFS driver is the one used for Lustre if no other one exists.
>>>>>
>>>>>
>>>>>           
>>>>>> Besides the fact that most of the adio that were created over the years are
>>>>>> completely obsolete and could be cleaned from ROMIO, what will the new one
>>>>>> for Lustre offer? Particularly with respect to controls via the lfs utility
>>>>>> that I can  already get?
>>>>>>
>>>>>>             
>>>>> There is improved collective IO that aligns the IO on Lustre stripe
>>>>> boundaries.  Also the hints given to the MPIIO layer (before open,
>>>>> not after) result in lustre picking a better stripe count/size.
>>>>>
>>>>>
>>>>>           
>>>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O
>>>> support. Lockless I/O support was purged out due into my lack of
>>>> confidence in low-level file system support. But it can be revived when
>>>> possible.
>>>>
>>>> --
>>>> Weikuan Yu <+> 1-865-574-7990
>>>> http://ft.ornl.gov/~wyu/
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>       
>> --
>> Regards,
>> Tom Wangdi
>> --
>> Sun Lustre Group
>> System Software Engineer
>> http://www.sun.com
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080509/3a8ca828/attachment.htm>


More information about the lustre-discuss mailing list