[Lustre-discuss] mpi-io support

Andreas Dilger adilger at sun.com
Fri May 9 09:46:09 PDT 2008


On May 09, 2008  09:41 -0400, Phil Dickens wrote:
>   I am having similar struggles with locking on MPI-IO.
> I am doing a simple strided write, and it fails because
> of the locking. I'm a bit behind in the discussion, but
> is there a way to fix (workaround) this problem?? Is this
> something in my code, or the default driver (this is on
> lonestar at TACC)? I have even downloaded the most up to date
> version of MPICH, which I believe has a new Lustre ADIO
> driver, but I am running into the same issues.
> 
>   Any thoughts would be greatly appreciated!!

One possibility is to mount the clients with "-o localflock", leaving all
of the locking internal to Lustre.  This in essence provides single-node
flock (i.e. coherent on that node, but not across all clients).  The other
alternative is "-o flock", which is coherent locking across all clients,
but has a noticable performance impact and may affect stability, depending
on the version of lustre being used (newer is better of course).

I'm not positive of the internals of the MPI-IO code, whether it depends
on flock providing a barrier across nodes, or if it does this only for
e.g. NFS not keeping writes coherent so they don't clobber the same page
when writing.

Tom is the expert here...

> On Thu, 8 May 2008, Tom.Wang wrote:
> 
> > Hi
> >
> > Marty Barnaby wrote:
> >> To return to this discussion, in recent testing, I have found that
> >> writing to a Lustre FS via a higher level library, like PNetCDF, fails
> >> because the default for value for romio_ds_write is not disable. This
> >> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c
> > You can use MPI_Info_set to disable romio_ds_write.  What is the fail?
> > flock? since data-sieving need flock.
> >>
> >> I believe it has something to do with locking issues. I'm not sure how
> >> best to handle this, I'd prefer the data sieving default be disable,
> >> though I don't know all the implications there.
> > I agree data sieving should be disable. And also it check the contiguous
> > buftype or filetype only by fileview, which is not enough sometimes, and
> > trigger unnecessary read-modify-write even for contiguous
> > write(especially for those higher level library, if you choose
> > collective write). Since lustre has client cache and also the overhead
> > of flock and read-modify-write, so I doubt the performance improvements
> > we could  get from data-sieving on lustre, although I do not have
> > performance data to prove that.
> >> Maybe an ad_lustre_open should be a place where the  _ds_  hints are
> >> set to disable.
> > Yes, we should disable this for stride write in lustre. ad_lustre_open
> > seems a right place to do this.
> >
> > Thanks
> > WangDi
> >>
> >> Marty Barnaby
> >>
> >>
> >> Weikuan Yu wrote:
> >>> Andreas Dilger wrote:
> >>>
> >>>> On Mar 11, 2008  16:10 -0600, Marty Barnaby wrote:
> >>>>
> >>>>> I'm not actually sure what ROMIO abstract device the multiple CFS
> >>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS.
> >>>>> Did you have a recommended option yourself.
> >>>>>
> >>>> The UFS driver is the one used for Lustre if no other one exists.
> >>>>
> >>>>
> >>>>> Besides the fact that most of the adio that were created over the years are
> >>>>> completely obsolete and could be cleaned from ROMIO, what will the new one
> >>>>> for Lustre offer? Particularly with respect to controls via the lfs utility
> >>>>> that I can  already get?
> >>>>>
> >>>> There is improved collective IO that aligns the IO on Lustre stripe
> >>>> boundaries.  Also the hints given to the MPIIO layer (before open,
> >>>> not after) result in lustre picking a better stripe count/size.
> >>>>
> >>>>
> >>>
> >>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O
> >>> support. Lockless I/O support was purged out due into my lack of
> >>> confidence in low-level file system support. But it can be revived when
> >>> possible.
> >>>
> >>> --
> >>> Weikuan Yu <+> 1-865-574-7990
> >>> http://ft.ornl.gov/~wyu/
> >>>
> >>>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> >
> >
> > --
> > Regards,
> > Tom Wangdi
> > --
> > Sun Lustre Group
> > System Software Engineer
> > http://www.sun.com
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list