[Lustre-discuss] mpi-io support

Tom.Wang Tom.Wang at Sun.COM
Fri May 9 09:04:46 PDT 2008


Hi, Phil

If lustre client is mounted with -o localflock or -o flock, you will not 
met this problem.
Otherwise you could either

Use posix write instead of MPI

or

use MPI_Info_Set to disable data_sieving,

ierr = MPI_Info_create(&FILE_INFO_TEMPLATE);

.........
ierr = MPI_Info_set(FILE_INFO_TEMPLATE, "romio_ds_write", disable);

I do not think you can avoid this in the current release lustre ADIO 
driver.


Thanks

Phil Dickens wrote:
> hello,
>
>   I am having similar struggles with locking on MPI-IO.
> I am doing a simple strided write, and it fails because
> of the locking. I'm a bit behind in the discussion, but
> is there a way to fix (workaround) this problem?? Is this
> something in my code, or the default driver (this is on
> lonestar at TACC)? I have even downloaded the most up to date
> version of MPICH, which I believe has a new Lustre ADIO
> driver, but I am running into the same issues.
>
>   Any thoughts would be greatly appreciated!!
>
> Phil
>
>
> On Thu, 8 May 2008, Tom.Wang wrote:
>
>   
>> Hi
>>
>> Marty Barnaby wrote:
>>     
>>> To return to this discussion, in recent testing, I have found that
>>> writing to a Lustre FS via a higher level library, like PNetCDF, fails
>>> because the default for value for romio_ds_write is not disable. This
>>> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c
>>>       
>> You can use MPI_Info_set to disable romio_ds_write.  What is the fail?
>> flock? since data-sieving need flock.
>>     
>>> I believe it has something to do with locking issues. I'm not sure how
>>> best to handle this, I'd prefer the data sieving default be disable,
>>> though I don't know all the implications there.
>>>       
>> I agree data sieving should be disable. And also it check the contiguous
>> buftype or filetype only by fileview, which is not enough sometimes, and
>> trigger unnecessary read-modify-write even for contiguous
>> write(especially for those higher level library, if you choose
>> collective write). Since lustre has client cache and also the overhead
>> of flock and read-modify-write, so I doubt the performance improvements
>> we could  get from data-sieving on lustre, although I do not have
>> performance data to prove that.
>>     
>>> Maybe an ad_lustre_open should be a place where the  _ds_  hints are
>>> set to disable.
>>>       
>> Yes, we should disable this for stride write in lustre. ad_lustre_open
>> seems a right place to do this.
>>
>> Thanks
>> WangDi
>>     
>>> Marty Barnaby
>>>
>>>
>>> Weikuan Yu wrote:
>>>       
>>>> Andreas Dilger wrote:
>>>>
>>>>         
>>>>> On Mar 11, 2008  16:10 -0600, Marty Barnaby wrote:
>>>>>
>>>>>           
>>>>>> I'm not actually sure what ROMIO abstract device the multiple CFS
>>>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS.
>>>>>> Did you have a recommended option yourself.
>>>>>>
>>>>>>             
>>>>> The UFS driver is the one used for Lustre if no other one exists.
>>>>>
>>>>>
>>>>>           
>>>>>> Besides the fact that most of the adio that were created over the years are
>>>>>> completely obsolete and could be cleaned from ROMIO, what will the new one
>>>>>> for Lustre offer? Particularly with respect to controls via the lfs utility
>>>>>> that I can  already get?
>>>>>>
>>>>>>             
>>>>> There is improved collective IO that aligns the IO on Lustre stripe
>>>>> boundaries.  Also the hints given to the MPIIO layer (before open,
>>>>> not after) result in lustre picking a better stripe count/size.
>>>>>
>>>>>
>>>>>           
>>>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O
>>>> support. Lockless I/O support was purged out due into my lack of
>>>> confidence in low-level file system support. But it can be revived when
>>>> possible.
>>>>
>>>> --
>>>> Weikuan Yu <+> 1-865-574-7990
>>>> http://ft.ornl.gov/~wyu/
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>       
>> --
>> Regards,
>> Tom Wangdi
>> --
>> Sun Lustre Group
>> System Software Engineer
>> http://www.sun.com
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>     
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   


-- 
Regards,
Tom Wangdi    
--
Sun Lustre Group
System Software Engineer 
http://www.sun.com




More information about the lustre-discuss mailing list