[Lustre-discuss] [Fwd: [ofa-general] Announcing the release of MVAPICH 1.0]
Weikuan Yu
weikuan.yu at gmail.com
Fri Feb 29 05:26:19 PST 2008
Per the announcement from the MVAPICH team, I am pleased to let you know
that the MPI-IO support for Lustre has been integrated into the new
release of MVAPICH, version 1.0.
> - Optimized and high-performance ADIO driver for Lustre
> - This MPI-IO support is a contribution from Future Technologies
> Group, Oak Ridge National Laboratory.
> (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start)
> - Performance graph at:
> http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml
Please feel free to try it out and send your comments/questions to this
lustre-discuss list or mvapich-discuss at cse.ohio-state.edu.
Thanks,
--Weikuan
-------- Original Message --------
Subject: [ofa-general] Announcing the release of MVAPICH 1.0
Date: Fri, 29 Feb 2008 00:17:48 -0500 (EST)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: ewg at lists.openfabrics.org, <general at lists.openfabrics.org>
The MVAPICH team is pleased to announce the availability of MVAPICH
1.0 with the following NEW features:
- New Scalable and robust job startup
- Enhanced and robust mpirun_rsh framework to provide scalable
launching on multi-thousand core clusters
- Running time of `MPI Hello World' program on 1K cores is around
4 sec and on 32K cores is around 80 sec
- Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
QLogic InfiniPath devices
- Performance graph at:
http://mvapich.cse.ohio-state.edu/performance/startup.shtml
- Enhanced support for SLURM
- Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
QLogic InfiniPath devices
- New OpenFabrics Gen2 Unreliable-Datagram (UD)-based design
for large-scale InfiniBand clusters (multi-thousand cores)
- delivers performance and scalability with constant
memory footprint for communication contexts
- Only 40MB per process even with 16K processes connected to
each other
- Performance graph at:
http://mvapich.cse.ohio-state.edu/performance/mvapich/ud_memory.shtml
- zero-copy protocol for large data transfer
- shared memory communication between cores within a node
- multi-core optimized collectives
(MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce)
- enhanced MPI_Allgather collective
- New features for OpenFabrics Gen2-IB interface
- enhanced coalescing support with varying degree of coalescing
- support for ConnectX adapter
- support for asynchronous progress at both sender and receiver
to overlap computation and communication
- multi-core optimized collectives (MPI_Bcast)
- tuned collectives (MPI_Allgather, MPI_Bcast)
based on network adapter characteristics
- Performance graph at:
http://mvapich.cse.ohio-state.edu/performance/collective.shtml
- network-level fault tolerance with Automatic Path Migration (APM)
for tolerating intermittent network failures over InfiniBand.
- New Support for QLogic InfiniPath adapters
- high-performance point-to-point communication
- optimized collectives (MPI_Bcast and MPI_Barrier) with k-nomial
algorithms while exploiting multi-core architecture
- Optimized and high-performance ADIO driver for Lustre
- This MPI-IO support is a contribution from Future Technologies Group,
Oak Ridge National Laboratory.
(http://ft.ornl.gov/doku/doku.php?id=ft:pio:start)
- Performance graph at:
http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml
- Flexible user defined processor affinity for better resource utilization
on multi-core systems
- flexible process bindings to cores
- allows memory-intensive applications to run with a subset of cores
on each chip for better performance
More details on all features and supported platforms can be obtained
by visiting the following URL:
http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml
MVAPICH 1.0 continues to deliver excellent performance. Sample
performance numbers include:
- with OpenFabrics/Gen2 on EM64T quad-core with PCIe and ConnectX-DDR:
- 1.51 microsec one-way latency (4 bytes)
- 1404 MB/sec unidirectional bandwidth
- 2713 MB/sec bidirectional bandwidth
- with PSM on Opteron with Hypertransport and QLogic-SDR:
- 1.25 microsec one-way latency (4 bytes)
- 953 MB/sec unidirectional bandwidth
- 1891 MB/sec bidirectional bandwidth
Performance numbers for all other platforms, system configurations and
operations can be viewed by visiting `Performance' section of the
project's web page.
For downloading MVAPICH 1.0, associated user guide and
accessing the anonymous SVN, please visit the following URL:
http://mvapich.cse.ohio-state.edu
All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please post it to the mvapich-discuss mailing list.
Thanks,
The MVAPICH Team
======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux
Networx; and with equipment support from Advanced Clustering, AMD,
Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox,
Microway, NetEffect, QLogic and Sun Microsystems. Other technology
partner includes Etnus.
======================================================================
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
--
Weikuan Yu <+> 1-865-574-7990
http://ft.ornl.gov/~wyu/
More information about the lustre-discuss
mailing list