Mpiexec release 0.81: scalability, spawn, pbspro fixes

Pete Wyckoff pw at osc.edu
Thu Apr 20 00:47:49 EDT 2006


This release ended up having lots of changes.  It was a long 9
months ago when the previous release happened, so perhaps that is
not too surprising.  Everything that used to work should still work,
of course, so please complain if you find a regression.

Big system scalability

    Startup for GM (or MX) and InfiniBand is now asynchronous,
    meaning that mpiexec will spawn tasks and pay attention to ones
    that are starting up at the same time.  This greatly increases
    the speed for large systems, and avoids timeouts experienced by
    waiting tasks.  The largest reported machine using this work is
    an 8000-ish processor InfiniBand cluster at Sandia.

Newer mvapich

    Code was added to support modifications to the startup protocol
    by mvapich, an MPICHv1 on InfiniBand library.  However, the
    latest mvapich version 0.9.7 does not work with mpiexec.  See
    the mpiexec web page for details about how to patch your
    mvapich.

Myricom MX

    Support was added for Myrinet's new message passing protocol,
    MX.  The Myricom developers were nice enough to make MX look a
    lot like GM as far as mpiexec is concerned, so both protocols
    are supported in the same chunk of code.

MPI_Spawn

    Support for MPI2 process management features was added.  You can
    now call MPI_Spawn and have mpiexec add more processes
    dynamically to your parallel set of tasks.  This works with the
    PMI interface used by MPICH v2 from ANL and vendor releases
    based on that code.  Other MPI2 features such as name publishing
    are supported too.
    
    There are plenty of caveats to spawning, like no PBS will _add_
    processors to your running job so you have to plan ahead.  Maybe
    some day this too will change.

PBSPro redirection

    Some fixes for PBSPro issues were added, to work around both
    syntactic and semantic changes in the PBSPro versions of the TM
    and PBS interfaces.
    
    One nice new feature for people using PBSPro is the redirection
    helper.  It enables the use of stdio redirection even with a
    stock version of PBSPro.   If you configure with
    --enable-pbspro-helper, a second binary will be built and
    installed.  Mpiexec launches this code on each compute node; the
    helper takes care of connecting the stdio sockets back to
    mpiexec, then starts your MPI task.

    Torque, and OpenPBS with the mpiexec patch, never need this
    feature---they do stdio redirection themselves just fine.

SVN repository

    The source code repository is now Subversion, not CVS, mainly
    due to the good support of SVN by HPC system staff at OSC.

Little bug fixes

    A big assortment of little bug fixes, code cleanups, and
    compiler warning suppressions for various systems were added.

Full changelog and downloads at:  http://www.osc.edu/~pw/mpiexec/
Send bug reports, comments and suggestions to the mailing list.

		-- Pete



More information about the mpiexec mailing list