[Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15
Michael Booth
Michael.Booth at Sun.COM
Wed Apr 1 04:41:44 PDT 2009
On Apr 1, 2009, at 12:34 AM, Oleg Drokin wrote:
> Hello!
>
> On Mar 31, 2009, at 11:55 PM, Michael Booth wrote:
>> (My Opinion) The large size of the I/O request put onto the SeaStar
>> by the Lustre client is giving it an artificially high priority.
>> Barriers are just a few bytes, the I/Os from the client are in
>> megabytes. SeaStar has no priority in is queue, but the amount of
>> time it takes to clear megabyte request results in a priority that
>> is thousands of times more impact on the hardware than the small
>> synchronization requests of many collectives. I am wondering if the
>> interference from I/O to computation is more an artifact of message
>> size and bursts, than of congestion or routing inefficiencies in
>> seastar..
>> If there are hundreds of megabytes of request queued up on the
>> network, and there is no priority way to push a barrier or other
>> small mpi request up on the queue, it is bound to create a
>> disruption.
>> To borrow the elevator metaphor from Eric, if all the elevators are
>> queued up from 8:00 to 9:00 delivering office supplies on carts that
>> occupy the entire elevator, maybe the carts should be smaller, and
>> limited to a few per elevator trip.
>
> As we discussed in the past, just sending small i/o messages is going
> to uncover all kinds of slowdowns all the way back to the disk
> storage,
> and the collateral damage would be other tasks that do need fast i/o
> and do send big chunks of data.
>
> Bye,
> Oleg
> _______________________________________________
Don't take my explanation as a general suggestion for all I/O, It is a
suggestion for I/O taking place during times of need for high response
for mpi. How to know when the need is high is another issue.
To over extend the metaphor, can the size of the office supply carts
be small and the amount number allowed on the elevator up be limited
from 8:00 to 9:00 am when the office people traffic is in need of high
response?
It is clear to the application when they are doing synchronous i/o
and don't care so much about mpi response and when they are in a stage
that a collective response is important. For example: the barrier
after an fsync is definitely desiring highest response to I/O
request. After a barrier is complete, mpi would likely need highest
response, until another user synchronous i/o call, (not to include
printf's). I could even be an explicit call to the Lustre client from
the application to switch the priority between states.
Mike
More information about the lustre-devel
mailing list