Mpiexec release 0.80: concurrency, fastdist, mpich/rai, ...
Pete Wyckoff
pw at osc.edu
Fri Jul 15 14:10:21 EDT 2005
A fair amount of change happened since 0.78 was released four months
ago. There was a stealth release 0.79 three months ago that added a
few features, now some major changes with 0.80. But if you continue
to use mpiexec as you always have, everything should still work as
before. The list of changes is somewhat long, though.
Concurrency feature
If you want to run two independent parallel processes at the
same time, you would be tempted to do something like:
mpiexec -n 4 code1 &
mpiexec -n 3 code2
Only that has not worked until this release. The problem is
there is a limitation in TM interface used by PBS that only one
TM client (aka mpiexec) can use it at a time. Your alternative
was to fall back to good-old rsh-based mpirun or to run two
independent batch jobs and figure out how to get them to
sychronize.
Now the first mpiexec to run in a PBS job will listen on a named
pipe in /tmp, and later mpiexec processes will connect to it
for all TM activity. All the subsequent mpiexec processes can
start and stop in any order with no hardcoded limit on the
number or sizes of tasks. You might also do:
mpiexec -server &
to start one "master" mpiexec for all the others if having one
"empty" mpiexec process best fits your computation model, as
perhaps in a branch-and-bound style optimization.
The first (or master) mpiexec still enforces non-overlapping
processor allocation, so the sum of all the running "-n <numproc>"
may not be larger than the overall PBS allocation. All other
command line processing is handled by the individual client
mpiexec processes, as is all stdio for their respective parallel
programs.
The addition of this feature was made possible through
contributions by the DAKOTA Code Group of Sandia National
Laboratories.
Fast executable distribution
New experimental executable distribution code was added. If you
have an Infiniband network, you can choose to download and use
code written by Dennis Dalessandro found at:
http://www.osc.edu/~dennis/fastdist
to push your executable to all the compute nodes very rapidly
instead of relying on NFS. Mpiexec can be configured to call
this code before execution.
Mpich/rai support
This adds support for the Rapid Array Interconnect version of
MPICH used by Cray on their XD1 machines. These are Opteron
clusters with custom message passing code on an Infiniband
physical-layer transport. The MPICH device comes from the
MVIA heritage and thus looks a lot like the old-style MPICH/IB
startup code.
Config file task ordering
Now entries in a --config file are spawned in order, that is,
the first line becomes task 0, the second becomes task 1, etc.
Previously the MPI task order was always fixed by what was
handed out by PBS. Now you can order them at will using by
using a --config file.
Another message passing hostname transformation option
New option -transform-hostname-program generalizes the existing
option -transform-hostname by allowing specification of any
external program to change the canonical hostnames to the
hostnames used for the message passing fabric. Contributed by
Dries Kimpe.
Signal handling
More attention is paid to handling signals. In particular, if
you hit ctrl-C, mpiexec will try to kill off all tasks and the
stdio handler. If you are impatient and hit it again, it will
just exit immediately. Previously there were situations in
which tasks could linger or (worse?) in which mpiexec would
refuse to exit no matter how hard one hammered the keyboard.
Internal data structures
Pretty much all of the code that handles nodes, tasks, and
events was rewritten to support the new concurrent feature. It
is quite a bit prettier (says the author) and should make future
maintenance easier.
Little bug fixes
Don't die when stdio exits early in mpich2/pmi.
Pass the entire environment even when it is large (as in lots of
environment variables, or big ones), contributed by Belmont
Cheung.
Use va_list correctly, contibuted by Kai Germaschewski.
Support Topsin-specific changes to MPICH/IB startup code.
Support original mpich2 version naming string used by Intel
mpich2, contributed by Anton Starikov.
Full changelog and downloads at: http://www.osc.edu/~pw/mpiexec/
Send bug reports, comments and suggestions to the mailing list.
-- Pete
More information about the mpiexec
mailing list