[Lustre-discuss] How to achieve 20GB/s file system throughput?
    Joe Landman 
    landman at scalableinformatics.com
       
    Sat Jul 24 06:08:32 PDT 2010
    
    
  
Hate to reply to myself ... not an advertisement
On 07/23/2010 10:50 PM, Joe Landman wrote:
> On 07/23/2010 10:25 PM, Henry_Xu at Dell.com wrote:
[...]
> It is possible to achieve 20GB/s, and quite a bit more, using Lustre.
> As to whether or not that 20GB/s is meaningful to their code(s), thats a
> different question.  It would be 20GB/s in aggregate, over possibly many
> compute nodes doing IO.
I should point out that we have customers with 20GB/s maximum 
theoretical configs (best case scenarios) with our siCluster 
(http://scalableinformatics.com/sicluster), with 8 IO units.  Their 
write patterns and Infiniband configurations don't seem to allow 
achieving this in practice.  Simple benchmark tests (mixtures of llnl 
mpi-io, io-bm, iozone, ...) show sustained results north of 12 GB/s for 
them.
Again, to set expectations, most users codes never utilize storage 
systems very effectively, hence you might design a 20GB/s storage 
system, and the IO being done might not hit much above 500 MB/s for 
single threads.
>> My assumption is 100 or more IO nodes(rack servers) are needed.
> Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about
> 40 OSTs.  You can have each OSS handle several OSTs.  There are
> efficiency losses you should be aware of, but 20GB/s using some
> mechanism to measure this, should be possible with a realistic number of
> units.  Don't forget to count efficiency losses in the design.
We do this in 8 machines (theoretical max performance), and could put 
this in a single rack.  We prefer to break it out among more IO nodes, 
say 16-24 smaller nodes, with 2-3 OSTs per OSS (e.g. IO node).
My comments are to make sure your customer understands the efficiency 
issues, and that simple fortran writes from a single thread aren't going 
to be done at 20GB/s.  That is, not unlike a compute cluster, a storage 
cluster has an aggregate bandwidth, that a single node or reader/writer 
cannot achieve on its own.
Regards,
Joe
-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
    
    
More information about the lustre-discuss
mailing list