[Lustre-discuss] [ROMIO Req #940] a new Lustre ADIO driver]
Rob Latham
robl at mcs.anl.gov
Tue May 12 08:21:49 PDT 2009
On Tue, May 12, 2009 at 04:10:44AM -0500, David Knaak wrote:
> With my version of the Lustre stripe-aligned collective buffering
> (which merges some of LiuYing's code with mine):
>
> coll_test passes (2 PEs as required)
> noncontig_coll passes (2 PEs as required)
> hindexed passes (4 PEs as required)
> aggregation1 passes (up to 60 PEs, that's as high as I'm going tonight)
> aggregation2 passes (up to 60 PEs, that's as high as I'm going tonight)
> split_coll passes (up to 60 PEs, that's as high as I'm going tonight)
Great! Thanks for the additional information.
> The one collective buffering test that fails is noncontig_coll2. I'll
> look at that more closely.
noncontig_coll2 is a tricky one in that it re-orders the I/O
aggregators. The hint "cb_config_list" lets a user explicity specify
which MPI processors should be i/o aggregators. There is no reason to
expect the user to construct that list in rank-order, so ROMIO should
be able to handle any permutation of nodes.
If any part of the code makes an implicit assumption about the order
of ranks, nonconitg_coll2 will probably give it a headache. I added
that test back in december 2002... 7 years later I still remember how
much of a pain it was to track down the fix.
> One other test that fails is shared_fp (60 PEs) but it also fails
> with collective buffering disabled, and besides, it doesn't make any
> collective I/O calls. I haven't looked closely yet at the test.
shared_fp will do fcntl locks to coordinate writes to a hidden file.
This hidden file contains one value: the value of the shared file
pointer. I don't know what would be particularly tricky about that
test, but at least we can rule out the two-phase code.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the lustre-discuss
mailing list