[Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15

Oleg Drokin Oleg.Drokin at Sun.COM
Tue Mar 31 13:58:36 PDT 2009


On Mar 31, 2009, at 2:51 PM, Andreas Dilger wrote:
>>> This latter concept is the basis for the "flash cache" concept.
>>> Actually, I think it's worth exploring the economics of it in more
>>> detail.
>> This turns out to be a very true assertion. We (I) do see a huge  
>> delay
>> in e.g. MPI barriers done immediately after write.
> While this is true, I still believe that the amount of delay seen by
> the application cannot possibly be worse than waiting for all of the
> IO to complete.  Also, the question is whether you are measuring the


> FIRST MPI barrier after the write, vs e.g. the SECOND MPI barrier
> after the write?  Since Lustre is currently aggressively flushing the
> write cache then the first MPI barrier is essentially waiting for all
> of the IO to complete, which is of course very slow.  The real item
> of interest is how long the SECOND MPI barrier takes, which is what
> the overhead of Lustre IO is on the network performance.

Second MPI takes 1.5 seconds for me.

> It is impossible that Lustre IO completely saturates the entire
> cross-sectional bandwidth of the system OR the client CPUs, so having
> some amount of computation for "free" during IO is still better than
> waiting for the IO to complete.

No arguments about that from me, I am advocating this same thing from
the very beginning


More information about the lustre-devel mailing list