[Lustre-discuss] [Fwd: [ofa-general] Announcing the release of MVAPICH 1.0]

Fri Feb 29 05:26:19 PST 2008

Per the announcement from the MVAPICH team, I am pleased to let you know 
that the MPI-IO support for Lustre has been integrated into the new 
release of MVAPICH, version 1.0.

 > - Optimized and high-performance ADIO driver for Lustre
 >    - This MPI-IO support is a contribution from Future Technologies
 >       Group, Oak Ridge National Laboratory.
 >       (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start)
 >      - Performance graph at:
 > http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml

Please feel free to try it out and send your comments/questions to this 
lustre-discuss list or mvapich-discuss at cse.ohio-state.edu.

Thanks,
--Weikuan

-------- Original Message --------
Subject: [ofa-general] Announcing the release of MVAPICH 1.0
Date: Fri, 29 Feb 2008 00:17:48 -0500 (EST)
From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
To: ewg at lists.openfabrics.org, <general at lists.openfabrics.org>

The MVAPICH team is pleased to announce the availability of MVAPICH
1.0 with the following NEW features:

- New Scalable and robust job startup
    - Enhanced and robust mpirun_rsh framework to provide scalable
      launching on multi-thousand core clusters
      - Running time of `MPI Hello World' program on 1K cores is around
        4 sec and on 32K cores is around 80 sec
      - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
        QLogic InfiniPath devices
      - Performance graph at:
        http://mvapich.cse.ohio-state.edu/performance/startup.shtml
    - Enhanced support for SLURM
      - Available for OpenFabrics/Gen2, OpenFabrics/Gen2-UD and
        QLogic InfiniPath devices

- New OpenFabrics Gen2 Unreliable-Datagram (UD)-based design
   for large-scale InfiniBand clusters (multi-thousand cores)
    - delivers performance and scalability with constant
      memory footprint for communication contexts
       - Only 40MB per process even with 16K processes connected to
         each other
       - Performance graph at:

http://mvapich.cse.ohio-state.edu/performance/mvapich/ud_memory.shtml
    - zero-copy protocol for large data transfer
    - shared memory communication between cores within a node
    - multi-core optimized collectives
      (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce)
    - enhanced MPI_Allgather collective

- New features for OpenFabrics Gen2-IB interface
    - enhanced coalescing support with varying degree of coalescing
    - support for ConnectX adapter
    - support for asynchronous progress at both sender and receiver
      to overlap computation and communication
    - multi-core optimized collectives (MPI_Bcast)
    - tuned collectives (MPI_Allgather, MPI_Bcast)
      based on network adapter characteristics
       - Performance graph at:
         http://mvapich.cse.ohio-state.edu/performance/collective.shtml
    - network-level fault tolerance with Automatic Path Migration (APM)
      for tolerating intermittent network failures over InfiniBand.

- New Support for QLogic InfiniPath adapters
    - high-performance point-to-point communication
    - optimized collectives (MPI_Bcast and MPI_Barrier) with k-nomial
      algorithms while exploiting multi-core architecture

- Optimized and high-performance ADIO driver for Lustre
    - This MPI-IO support is a contribution from Future Technologies Group,
      Oak Ridge National Laboratory.
      (http://ft.ornl.gov/doku/doku.php?id=ft:pio:start)
       - Performance graph at:
         http://mvapich.cse.ohio-state.edu/performance/mvapich/romio.shtml

- Flexible user defined processor affinity for better resource utilization
   on multi-core systems
    - flexible process bindings to cores
    - allows memory-intensive applications to run with a subset of cores
      on each chip for better performance

More details on all features and supported platforms can be obtained
by visiting the following URL:

http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml

MVAPICH 1.0 continues to deliver excellent performance. Sample
performance numbers include:

   - with OpenFabrics/Gen2 on EM64T quad-core with PCIe and ConnectX-DDR:
         - 1.51 microsec one-way latency (4 bytes)
         - 1404 MB/sec unidirectional bandwidth
         - 2713 MB/sec bidirectional bandwidth

   - with PSM on Opteron with Hypertransport and QLogic-SDR:
         - 1.25 microsec one-way latency (4 bytes)
         - 953 MB/sec unidirectional bandwidth
         - 1891 MB/sec bidirectional bandwidth

Performance numbers for all other platforms, system configurations and
operations can be viewed by visiting `Performance' section of the
project's web page.

For downloading MVAPICH 1.0, associated user guide and
accessing the anonymous SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu

All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please post it to the mvapich-discuss mailing list.

Thanks,

The MVAPICH Team

======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux
Networx; and with equipment support from Advanced Clustering, AMD,
Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox,
Microway, NetEffect, QLogic and Sun Microsystems. Other technology
partner includes Etnus.
======================================================================

_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general

-- 
Weikuan Yu <+> 1-865-574-7990
http://ft.ornl.gov/~wyu/