[Lustre-discuss] mpi-io support

Wei-keng Liao wkliao at ece.northwestern.edu
Fri May 9 08:46:13 PDT 2008


Hi, Marty and Phil,

Since Lustre ADIO driver is new to MPICH2-1.0.7, it may still have bugs. 
One way to check if it is because of Lustre ADIO is to force ROMIO to use 
UFS (Unix file system) driver. This can be achieved by adding prefix 
"ufs:" to the file name. Note that data sieving will still be enabled by 
default when using UFS.

As a pnetcdf developer, I am interested in the problem Marty had. I also 
run pnetcdf codes on Lustre but so far have not seen a problem that is 
related to file locking. I wonder if it is possible for you to provide a 
test code to reporduce the error.

Wei-keng


On Fri, 9 May 2008, Marty Barnaby wrote:
> Phil,
> 
> If you are having the same problems I've had, I would offer to try the 
> advise that some have given below. I am working with several layers of 
> which I am not the owner, but I have the source and can make edits. For 
> me, it is reasonable to call my own, explicit MPI_info_set during 
> initialization, for the hints, romio_ds_write and romio_ds_read changing 
> both their respective values to 'disable'. How these defaults are 
> initialized in the ROMIO code in adio/common/ad_hints.c (for these two, 
> specifically, 'enable') is the only best documentation I have found on 
> matter. I've never seen anything describing all the hints available, and 
> the syntax and semantics for the acceptable values.
> 
> I don't fully understand data sieving, but I believe it is an older 
> paradigm, and not applicable to our current, high-performance, 
> large-distribution, parallel FS. My suggestion was that, at least here, 
> with Lustre, and it's new abstract device routines, the _ds_ be set to 
> disable, so I don't have to find a place in every new library I deal 
> with to set it explicitly myself.
> 
> 
> Marty
> 
> 
> 
> Phil Dickens wrote:
> > hello,
> >
> >   I am having similar struggles with locking on MPI-IO.
> > I am doing a simple strided write, and it fails because
> > of the locking. I'm a bit behind in the discussion, but
> > is there a way to fix (workaround) this problem?? Is this
> > something in my code, or the default driver (this is on
> > lonestar at TACC)? I have even downloaded the most up to date
> > version of MPICH, which I believe has a new Lustre ADIO
> > driver, but I am running into the same issues.
> >
> >   Any thoughts would be greatly appreciated!!
> >
> > Phil
> >
> >
> > On Thu, 8 May 2008, Tom.Wang wrote:
> >
> >   
> > > Hi
> > >
> > > Marty Barnaby wrote:
> > >     
> > > > To return to this discussion, in recent testing, I have found that
> > > > writing to a Lustre FS via a higher level library, like PNetCDF, fails
> > > > because the default for value for romio_ds_write is not disable. This
> > > > is set in the mpich code in the file
> > > > /src/mpi/romio/adio/common/ad_hints.c
> > > >       
> > > You can use MPI_Info_set to disable romio_ds_write.  What is the fail?
> > > flock? since data-sieving need flock.
> > >     
> > > > I believe it has something to do with locking issues. I'm not sure how
> > > > best to handle this, I'd prefer the data sieving default be disable,
> > > > though I don't know all the implications there.
> > > >       
> > > I agree data sieving should be disable. And also it check the contiguous
> > > buftype or filetype only by fileview, which is not enough sometimes, and
> > > trigger unnecessary read-modify-write even for contiguous
> > > write(especially for those higher level library, if you choose
> > > collective write). Since lustre has client cache and also the overhead
> > > of flock and read-modify-write, so I doubt the performance improvements
> > > we could  get from data-sieving on lustre, although I do not have
> > > performance data to prove that.
> > >     
> > > > Maybe an ad_lustre_open should be a place where the  _ds_  hints are
> > > > set to disable.
> > > >       
> > > Yes, we should disable this for stride write in lustre. ad_lustre_open
> > > seems a right place to do this.
> > >
> > > Thanks
> > > WangDi
> > >     
> > > > Marty Barnaby
> > > >
> > > >
> > > > Weikuan Yu wrote:
> > > >       
> > > > > Andreas Dilger wrote:
> > > > >
> > > > >         
> > > > > > On Mar 11, 2008  16:10 -0600, Marty Barnaby wrote:
> > > > > >
> > > > > >           
> > > > > > > I'm not actually sure what ROMIO abstract device the multiple CFS
> > > > > > > deployments I utilize were defined with. Probably just UFS, or
> > > > > > > maybe NFS.
> > > > > > > Did you have a recommended option yourself.
> > > > > > >
> > > > > > >             
> > > > > > The UFS driver is the one used for Lustre if no other one exists.
> > > > > >
> > > > > >
> > > > > >           
> > > > > > > Besides the fact that most of the adio that were created over the
> > > > > > > years are
> > > > > > > completely obsolete and could be cleaned from ROMIO, what will the
> > > > > > > new one
> > > > > > > for Lustre offer? Particularly with respect to controls via the
> > > > > > > lfs utility
> > > > > > > that I can  already get?
> > > > > > >
> > > > > > >             
> > > > > > There is improved collective IO that aligns the IO on Lustre stripe
> > > > > > boundaries.  Also the hints given to the MPIIO layer (before open,
> > > > > > not after) result in lustre picking a better stripe count/size.
> > > > > >
> > > > > >
> > > > > >           
> > > > > In addition, the one integrated into MPICH2-1.0.7 contains direct I/O
> > > > > support. Lockless I/O support was purged out due into my lack of
> > > > > confidence in low-level file system support. But it can be revived
> > > > > when
> > > > > possible.
> > > > >
> > > > > --
> > > > > Weikuan Yu <+> 1-865-574-7990
> > > > > http://ft.ornl.gov/~wyu/
> > > > >
> > > > >
> > > > >         
> > > > ------------------------------------------------------------------------
> > > >
> >>> _______________________________________________
> > > > Lustre-discuss mailing list
> > > > Lustre-discuss at lists.lustre.org
> > > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > > >
> > > >       
> > > --
> > > Regards,
> > > Tom Wangdi
> > > --
> > > Sun Lustre Group
> > > System Software Engineer
> > > http://www.sun.com
> > >
> >> _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > >
> > >     
> >
> >   
> 
> 




More information about the lustre-discuss mailing list