[Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15
Michael Booth
Michael.Booth at Sun.COM
Tue Mar 31 20:55:10 PDT 2009
On Mar 31, 2009, at 11:35 PM, di wang wrote:
Hello,
Andreas Dilger wrote:
> If each compute timestep takes 0.1s during IO vs 0.01s without IO and
> you would get 990 timesteps during the write flush in the second case
> until the cache was cleared, vs. none in the first case. I suspect
> that the overhead of the MPI communication on the Lustre IO is small,
> since the IO will be limited by the OST network and disk bandwidth,
> which is generally a small fraction of the cross-sectional bandwidth.
>
> This could be tested fairly easily with a real application that is
> doing computation between IO, instead of a benchmark that is only
> doing
> IO or only sleeping between IO, simply by increasing the per-OSC write
> cache limit from 32MB to e.g. 1GB in the above case (or 2GB to avoid
> the
> case where 2 processes on the same node are writing to the same OST).
> Then, measure the time taken for the application to do, say, 1M
> timesteps
> and 100 checkpoints with the 32MB and the 2GB write cache sizes.
>
>
Can we implement aio here? for example the aio buffer can be
treated different as other dirty buffer, not
being pushed aggressively to server. It seems with buffer_write, the
user have to deal with fs buffer cache
issue in his application, not sure it is good for them, and we may not
even output these features to the
application.
Thanks
WangDi
(My Opinion) The large size of the I/O request put onto the SeaStar by
the Lustre client is giving it an artificially high priority.
Barriers are just a few bytes, the I/Os from the client are in
megabytes. SeaStar has no priority in is queue, but the amount of
time it takes to clear megabyte request results in a priority that is
thousands of times more impact on the hardware than the small
synchronization requests of many collectives. I am wondering if the
interference from I/O to computation is more an artifact of message
size and bursts, than of congestion or routing inefficiencies in
seastar..
If there are hundreds of megabytes of request queued up on the
network, and there is no priority way to push a barrier or other small
mpi request up on the queue, it is bound to create a disruption.
To borrow the elevator metaphor from Eric, if all the elevators are
queued up from 8:00 to 9:00 delivering office supplies on carts that
occupy the entire elevator, maybe the carts should be smaller, and
limited to a few per elevator trip.
Mike Booth
More information about the lustre-devel
mailing list