[Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15

Michael Booth Michael.Booth at Sun.COM
Wed Apr 1 04:41:44 PDT 2009

On Apr 1, 2009, at 12:34 AM, Oleg Drokin wrote:

> Hello!
> On Mar 31, 2009, at 11:55 PM, Michael Booth wrote:
>> (My Opinion) The large size of the I/O request put onto the SeaStar
>> by the Lustre client is giving it an artificially high priority.
>> Barriers are just a few bytes, the I/Os from the client are in
>> megabytes.   SeaStar has no priority in is queue, but  the amount of
>> time it takes to clear megabyte request results in a priority that
>> is thousands of times more impact on the hardware than the small
>> synchronization requests of many collectives.  I am wondering if the
>> interference from I/O to computation is more an artifact of message
>> size and bursts,  than of congestion or routing inefficiencies in
>> seastar..
>> If there are hundreds of megabytes of request queued up on the
>> network, and there is no priority way to push a barrier or other
>> small mpi request up on the queue, it is bound to create a  
>> disruption.
>> To borrow the elevator metaphor from Eric,  if all the elevators are
>> queued up from 8:00 to 9:00 delivering office supplies on carts that
>> occupy the entire elevator, maybe the carts should be smaller, and
>> limited to a few per elevator trip.
> As we discussed in the past, just sending small i/o messages is going
> to uncover all kinds of slowdowns all the way back to the disk  
> storage,
> and the collateral damage would be other tasks that do need fast i/o
> and do send big chunks of data.
> Bye,
>     Oleg
> _______________________________________________

Don't take my explanation as a general suggestion for all I/O, It is a  
suggestion for I/O taking place during times of need for high response  
for mpi.  How to know when the need is high is another issue.

To over extend the metaphor, can the size of the office supply carts  
be small and the amount number allowed on the elevator up be limited  
from 8:00 to 9:00 am when the office people traffic is in need of high  

It is clear to the application  when they are doing synchronous i/o  
and don't care so much about mpi response and when they are in a stage  
that a collective response is important.  For example:  the barrier  
after an fsync is definitely desiring highest response to I/O  
request.   After a barrier is complete, mpi would likely need highest  
response, until another user synchronous i/o call, (not to include  
printf's).  I could even be an explicit call to the Lustre client from  
the application to switch the priority between states.


More information about the lustre-devel mailing list