[Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15

Oleg Drokin Oleg.Drokin at Sun.COM
Tue Mar 31 21:34:50 PDT 2009


On Mar 31, 2009, at 11:55 PM, Michael Booth wrote:
> (My Opinion) The large size of the I/O request put onto the SeaStar  
> by the Lustre client is giving it an artificially high priority.   
> Barriers are just a few bytes, the I/Os from the client are in  
> megabytes.   SeaStar has no priority in is queue, but  the amount of  
> time it takes to clear megabyte request results in a priority that  
> is thousands of times more impact on the hardware than the small  
> synchronization requests of many collectives.  I am wondering if the  
> interference from I/O to computation is more an artifact of message  
> size and bursts,  than of congestion or routing inefficiencies in  
> seastar..
> If there are hundreds of megabytes of request queued up on the  
> network, and there is no priority way to push a barrier or other  
> small mpi request up on the queue, it is bound to create a disruption.
> To borrow the elevator metaphor from Eric,  if all the elevators are  
> queued up from 8:00 to 9:00 delivering office supplies on carts that  
> occupy the entire elevator, maybe the carts should be smaller, and  
> limited to a few per elevator trip.

As we discussed in the past, just sending small i/o messages is going  
to uncover all kinds of slowdowns all the way back to the disk storage,
and the collateral damage would be other tasks that do need fast i/o  
and do send big chunks of data.


More information about the lustre-devel mailing list