[Lustre-devel] SeaStar message priority

Andrew C. Uselton acuselton at lbl.gov
Tue Mar 31 22:10:00 PDT 2009

I wonder if that scenario may have some bearing on the results I've 
mentioned at:


It would be interesting to step through the logic if anyone is 
interested in doing so.  The web page itself is terse, so feel free to 
bug me for details if you have not seen this before.

Oleg Drokin wrote:
> Hello!
>    It came to my attention that seastar network does not implement  
> message priorities for various reasons.
>    I really think there is very valid case for the priorities of some  
> sort to allow MPI and other
>    latency-critical traffic to go in front of bulk IO traffic on the  
> wire.
>    Consider this test I was running the other day on Jaguar. The  
> application writes 250M of data from every
>    core with plain write() system call, the write() syscall returns  
> very fast (less than 0.5 sec == 400+Mb/sec
>    app-perceived bandwidth) because the data just goes to the memory  
> cache to be flushed later.
>    Then I do 2 barriers one by one with nothing in between.
>    If I run it at sufficient scale (say 1200 cores), the first barrier  
> takes 4.5 seconds to complete and
>    the second one 1.5 seconds, all due to MPI RPCs being stuck behind  
> huge bulk data requests on the clients,
>    presumably (I do not have any other good explanations at least).
>    This makes for a lot of wasted time in applications that would like  
> to use the buffering capabilities provided
>    by the OS.
>    Do you think something like this could be organized if not for  
> current revision then at least for the next
>    version?
> Bye,
>      Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

More information about the lustre-devel mailing list