[Lustre-discuss] [ROMIO Req #940] a new Lustre ADIO driver]

Rob Latham robl at mcs.anl.gov
Tue May 12 08:21:49 PDT 2009


On Tue, May 12, 2009 at 04:10:44AM -0500, David Knaak wrote:
> With my version of the Lustre stripe-aligned collective buffering 
> (which merges some of LiuYing's code with mine):
> 
>   coll_test passes (2 PEs as required)
>   noncontig_coll passes (2 PEs as required)
>   hindexed passes (4 PEs as required)
>   aggregation1 passes (up to 60 PEs, that's as high as I'm going tonight)
>   aggregation2 passes (up to 60 PEs, that's as high as I'm going tonight)
>   split_coll passes (up to 60 PEs, that's as high as I'm going tonight)

Great! Thanks for the additional information.

> The one collective buffering test that fails is noncontig_coll2.  I'll
> look at that more closely.

noncontig_coll2 is a tricky one in that it re-orders the I/O
aggregators.  The hint "cb_config_list" lets a user explicity specify
which MPI processors should be i/o aggregators.  There is no reason to
expect the user to construct that list in rank-order, so ROMIO should
be able to handle any permutation of nodes. 

If any part of the code makes an implicit assumption about the order
of ranks, nonconitg_coll2 will probably give it a headache.   I added
that test back in december 2002... 7 years later I still remember how
much of a pain it was to track down the fix.


> One other test that fails is shared_fp (60 PEs) but it also fails
> with collective buffering disabled, and besides, it doesn't make any
> collective I/O calls.  I haven't looked closely yet at the test.

shared_fp will do fcntl locks to coordinate writes to a hidden file.
This hidden file contains one value: the value of the shared file
pointer.  I don't know what would be particularly tricky about that
test, but at least we can rule out the two-phase code.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the lustre-discuss mailing list