[Lustre-discuss] Off-topic: largest existing Lustre file system?

Wed Jan 30 13:09:59 PST 2008

Marty,

Our benchmark measurements were made using IOR doing POSIX IO to a
single shared file (I believe).  

Since you mentioned MPI-IO...  Weikuan Yu (at ORNL) has done some work
to improve the MPI-IO Lustre ADIO driver.  Also, we have been sponsoring
work through a Lustre Centre of Excellence to further improve the ADIO
driver.  I'm optimistic that this can make collective IO perform at a
level that one would expect.  File-per-process runs often do run faster
up until the meta data activity associated with creating 10k+ files
starts to slow things down.  I'm a firm believer that collective IO
through libraries like MPI-IO, HDF5, and pNetCDF are the way things
should move.  It should be possible to embed enough intelligence in
these middle layers to do good stripe alignment, automatically tune
stripe counts, and stripe width, etc. Some of this will hopefully be
accomplished with the improvements being made to the ADIO.

--Shane

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of mlbarna
Sent: Wednesday, January 16, 2008 12:43 PM
To: lustre-discuss at clusterfs.com
Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file
system?

Could you elaborate on the benchmarking application(s) run that provided
these bandwidth numbers. I have a particular interest in MPI coded
programs
that perform collective I/O. In discussions, I find this topic sometimes
confused; my meaning is streamed, appending with all the data from all
the
processors for a single, atomic write operation filling disjoint
sections of
the same file. In MPI-IO, the MPI_File_write_all* family seems to define
my
focus area, run with or without two-phase aggregation. Imitating the
operation with simple, Posix I/O is acceptable, as far as I am
concerned.

In tests on redstorm from last year, I appended to a single, open file
at a
rate of 26 GB/s. I had to use exceptional parameters to achieve this
however: the file had an LFS stripe-count of 160, and I sent a 20 MB
buffer,
respectively, from each of a 160, total processor job, for an aggregate
of
3.2 GB per write_all operation. I consider this configuration out of the
range of any normal usage.

I believe that a faster rate could be achieved by a similar program that
wrote independently--that is, one-file-per-processor--such as via
NetCDF.
For this case, I would set the LFS stripe-count down to one.

Marty Barnaby

On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote:

> 
> Jeff,
> 
> I'm not aware of any.  For parallel file systems it is usually
bandwidth
> centric.
> 
> --Shane
> 
> -----Original Message-----
> From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com]
> Sent: Monday, January 14, 2008 4:56 PM
> To: Canon, Richard Shane; lustre-discuss at clusterfs.com
> Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file
> system?
> 
> Any spec's on IOPS rather than throughput?
> 
> Thanks.
> 
> Jeff Kennedy
> QCT Engineering Compute
> 858-651-6592
>  
>> -----Original Message-----
>> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss-
>> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane
>> Sent: Monday, January 14, 2008 1:49 PM
>> To: lustre-discuss at clusterfs.com
>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file
>> system?
>> 
>> 
>> Klaus,
>> 
>> Here are some that I know are pretty large.
>> 
>> * RedStorm - I think it has two roughly 50 GB/s file systems.  The
>> capacity may not be quite as large though.  I think they used FC
> drives.
>> It was DDN 8500 although that may have changed.
>> * CEA - I think they have a file system approaching 100 GB/s.  I
think
>> it is DDN 9550.  Not sure about the capacities.
>> * TACC has a large Thumper based system.  Not sure of the specs.
>> * ORNL - We have a 44 GB/s file system with around 800 TB of total
>> capacity.  That is DDN 9550.  We also have two new file system (20
> GB/s
>> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively).  Those
have
>> around 800 TB each (after RAID6).
>> * We are planning a 200 GB/s, around 10 PB file system now.
>> 
>> --Shane
>> 
>> -----Original Message-----
>> From: lustre-discuss-bounces at clusterfs.com
>> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc
>> Stearman
>> Sent: Monday, January 14, 2008 4:37 PM
>> To: lustre-discuss at clusterfs.com
>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file
>> system?
>> 
>> Klaus,
>> 
>> We currently have a 1.2PB lustre filesystem that we will be expanding
>> to 2.4PB in the near future.  I not sure about the highest sustained
>> IOPS, but we did have a user peak 19GB/s to one of our 500TB
>> filesystems recently. The backend for that was 16 DDN 8500 couplets
>> with write-cache turned OFF.
>> 
>> -Marc
>> 
>> ----
>> D. Marc Stearman
>> LC Lustre Systems Administrator
>> marc at llnl.gov
>> 925.423.9670
>> Pager: 1.888.203.0641
>> 
>> 
>> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote:
>> 
>>> 
>>> Hi there,
>>> 
>>> I was asked by a friend of a business contact of mine the other day
>>> to share
>>> some information about Lustre; seems he's planning to build what
> will
>>> eventually be about a 3 PB file system.
>>> 
>>> The CFS website doesn't appear to have any information on field
>>> deployments
>>> worth bragging about, so I figured I'd ask, just for fun; does
>>> anyone know:
>>> 
>>> - the size of the largest working Lustre file system currently in
>>> the field
>>> - the highest sustained number of IOPS seen with Lustre, and what
> the
>>> backend was?
>>> 
>>> cheers,
>>> Klaus
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> 

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss