[Lustre-discuss] MPI-IO / ROMIO support for Lustre

Rob Latham robl at mcs.anl.gov
Fri Nov 5 07:46:15 PDT 2010


On Wed, Nov 03, 2010 at 10:18:51AM +0000, Mark Dixon wrote:
> On Mon, 1 Nov 2010, Martin Pokorny wrote:
> ...
> > FWIW, we've been using MPICH2's MPI-IO/ROMIO/ADIO with Lustre (v 1.8)
> > for several months now, and it's been working reliably. We do mount the
> > Lustre filesystem with "flock"; at one time I thought it necessary, but
> > I don't recall if I verified that after the initial problems with MPI-IO
> > were resolved. Only a recent MPICH2 will have a working
> > MPI-IO/ROMIO/ADIO for Lustre; perhaps the code would work with OpenMPI
> > and MVAPICH2 as well.
> ...
> 
> Is MPICH2 where ROMIO is developed these days? I found it pretty difficult 
> to work out where the public face of its development was...

hello!  I'm "the ROMIO guy".  

MPICH2 always contains the latest ROMIO.  We'll try to sync up
with folks when major improvements happen.  The community has really
come through over the last year with a good Lustre driver for ROMIO,
and now I'm encouraging other projects using ROMIO to sync up with us.
OpenMPI is still running a fairly old version of ROMIO, though.

Pascal Deveze has done all the work of syncing, but is waiting for an
OpenMPI developer to say "ok, this looks fine" and commit it.

> 1) Is it a naive view that, if ROMIO asks for an flock, it needs it? And 
> that if it doesn't on Lustre, then eventually ROMIO will be developed to 
> stop asking for them?

ROMIO uses these fcntl locks in one place on Lustre: the noncontiguous
write path uses an optimization called "data sieving", which is a good
optmization except there is a read-modify-write step.  If two
processes simultaneously read-modify-write the same region, who wins?
We guard against this with an fcntl lock.  Or by disabling data
sieving writes.

> 2) A message in the list archive says that Cray recommend "flock" for 
> their clusters, and it sounds like they use an enhanced version of ROMIO in 
> their MPT product.

Cray-MPI version 3.2 or newer has a different (but good) Lustre driver
for ROMIO.  I defer to their advice on their systems.

> The ADIO driver for Lustre certainly looks like it's still being actively 
> worked-on to get it to maturity, and that various MPI implementations 
> still need time to incorporate those changes.
> 
> I note that the main MPI releases we use on our cluster (OpenMPI 1.4 and 
> MVAPICH2 1.4 - we're a year behind) have V04 of ad_lustre, but MVAPICH2 is 
> now closest to MPICH2 as it has since moved to V05. Looks like I need to 
> do a software refresh... and recommend our MPI-IO users to use MVAPICH2 
> for the time being.

you can take the romio distribution from the recent MPICH2-1.3.0
release and build that against an existing MPI library.   You have to
get the link order right, but if you are dedicated to using an old MPI
version for other reasons, linking in a new ROMIO version might be the
way to go.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the lustre-discuss mailing list