[Lustre-discuss] HLRN lustre breakdown

Andreas Dilger adilger at sun.com
Thu Aug 21 11:59:10 PDT 2008


On Aug 21, 2008  10:55 -0400, Brock Palen wrote:
> On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote:
> > This is a big nasty issue, particularly for HPC applications where
> > performance is a big issue.
> >
> > How does one even begin to benchmark the performance overhead of a
> > parallel filesystem with checksumming? I am having nightmares over the
> > ways vendors will try to play games with performance numbers.
> 
> True

Actually, Lustre 1.6.5 does checksumming by default, and that is how
we do our benchmarking.  Some customers will turn it off because the
overhead hurts them.  New customers may not even notice it...  Also, for
many workloads the data integrity is much more important than the speed.

> > My suspicion is that whenever a parallel filesystem with  
> > checksumming is
> > available and works, that all the end-users will just turn it off  
> > anyway
> > because the applications will run twice as fast without it, regardless
> > of what the benchmarks say.. leaving us back at the same problem.
> 
> I don't think this will be a problem. On current systems it may be  
> the case of the checksummed filesystem becoming cpu bound.  I think  
> the OST's will be bailed out by cpu speeds going up faster than disk  
> speeds. You just need to limit the number of OST's/OSS.

I agree that CPU speeds will almost certainly cover this in the future.

> Where I could see it being a problem is on the client side. That  
> assumes that writes and reads are competing with the application for  
> cycles.  So far on our clusters I see applications do ether compute  
> or IO on a thread/rank.  Not both, freeing up allocated cpus for IO.   

Yes, that is our experience also.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list