[Lustre-discuss] HLRN lustre breakdown

Brock Palen brockp at umich.edu
Thu Aug 21 12:59:09 PDT 2008


Really ?  You sure?  I just set up a new 1.6.5.1 filesystem this week:

[root at nyx003 ~]# cat /proc/fs/lustre/llite/nobackup-0000010037e27c00/ 
checksum_pages
  0

I am curious to test if they were on.  My MPI_File_write() of a large  
file was less than I expected, but it looked like OST's were cpu  
bound.  (two x4500's)

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Aug 21, 2008, at 2:59 PM, Andreas Dilger wrote:
> On Aug 21, 2008  10:55 -0400, Brock Palen wrote:
>> On Aug 21, 2008, at 10:22 AM, Troy Benjegerdes wrote:
>>> This is a big nasty issue, particularly for HPC applications where
>>> performance is a big issue.
>>>
>>> How does one even begin to benchmark the performance overhead of a
>>> parallel filesystem with checksumming? I am having nightmares  
>>> over the
>>> ways vendors will try to play games with performance numbers.
>>
>> True
>
> Actually, Lustre 1.6.5 does checksumming by default, and that is how
> we do our benchmarking.  Some customers will turn it off because the
> overhead hurts them.  New customers may not even notice it...   
> Also, for
> many workloads the data integrity is much more important than the  
> speed.
>
>>> My suspicion is that whenever a parallel filesystem with
>>> checksumming is
>>> available and works, that all the end-users will just turn it off
>>> anyway
>>> because the applications will run twice as fast without it,  
>>> regardless
>>> of what the benchmarks say.. leaving us back at the same problem.
>>
>> I don't think this will be a problem. On current systems it may be
>> the case of the checksummed filesystem becoming cpu bound.  I think
>> the OST's will be bailed out by cpu speeds going up faster than disk
>> speeds. You just need to limit the number of OST's/OSS.
>
> I agree that CPU speeds will almost certainly cover this in the  
> future.
>
>> Where I could see it being a problem is on the client side. That
>> assumes that writes and reads are competing with the application for
>> cycles.  So far on our clusters I see applications do ether compute
>> or IO on a thread/rank.  Not both, freeing up allocated cpus for IO.
>
> Yes, that is our experience also.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
>




More information about the lustre-discuss mailing list