Mpiexec release 0.81: scalability, spawn, pbspro fixes
Pete Wyckoff
pw at osc.edu
Thu Apr 20 00:47:49 EDT 2006
This release ended up having lots of changes. It was a long 9
months ago when the previous release happened, so perhaps that is
not too surprising. Everything that used to work should still work,
of course, so please complain if you find a regression.
Big system scalability
Startup for GM (or MX) and InfiniBand is now asynchronous,
meaning that mpiexec will spawn tasks and pay attention to ones
that are starting up at the same time. This greatly increases
the speed for large systems, and avoids timeouts experienced by
waiting tasks. The largest reported machine using this work is
an 8000-ish processor InfiniBand cluster at Sandia.
Newer mvapich
Code was added to support modifications to the startup protocol
by mvapich, an MPICHv1 on InfiniBand library. However, the
latest mvapich version 0.9.7 does not work with mpiexec. See
the mpiexec web page for details about how to patch your
mvapich.
Myricom MX
Support was added for Myrinet's new message passing protocol,
MX. The Myricom developers were nice enough to make MX look a
lot like GM as far as mpiexec is concerned, so both protocols
are supported in the same chunk of code.
MPI_Spawn
Support for MPI2 process management features was added. You can
now call MPI_Spawn and have mpiexec add more processes
dynamically to your parallel set of tasks. This works with the
PMI interface used by MPICH v2 from ANL and vendor releases
based on that code. Other MPI2 features such as name publishing
are supported too.
There are plenty of caveats to spawning, like no PBS will _add_
processors to your running job so you have to plan ahead. Maybe
some day this too will change.
PBSPro redirection
Some fixes for PBSPro issues were added, to work around both
syntactic and semantic changes in the PBSPro versions of the TM
and PBS interfaces.
One nice new feature for people using PBSPro is the redirection
helper. It enables the use of stdio redirection even with a
stock version of PBSPro. If you configure with
--enable-pbspro-helper, a second binary will be built and
installed. Mpiexec launches this code on each compute node; the
helper takes care of connecting the stdio sockets back to
mpiexec, then starts your MPI task.
Torque, and OpenPBS with the mpiexec patch, never need this
feature---they do stdio redirection themselves just fine.
SVN repository
The source code repository is now Subversion, not CVS, mainly
due to the good support of SVN by HPC system staff at OSC.
Little bug fixes
A big assortment of little bug fixes, code cleanups, and
compiler warning suppressions for various systems were added.
Full changelog and downloads at: http://www.osc.edu/~pw/mpiexec/
Send bug reports, comments and suggestions to the mailing list.
-- Pete
More information about the mpiexec
mailing list